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Boundary  Crossing  Probabilities  and  Statistical  Applications 

David  Siegmund 
Stanford  Univeristy 

Abstract 

This  paper  surreys  recent  results  involving  boundary  crossing  probabilities  and  related 
statistical  applications.  The  first  part  is  concerned  with  problems  of  sequential  analysis,  es- 
pecially  repeated  significance  tests  and  their  application  to  sequential  clinical  trials  involving 
survival  data.  The  second  part  develops  the  probability  theory  motivated  by  the  problems  of 
Part  1.  A  method  for  computing  first  passage  distributions  of  Brownian  motion  to  linear  bound¬ 
aries  is  introduced  and  then  modified  to  handle  problems  in  discrete  time  and  those  involving 
nonlinear  boundaries.  The  third  part  is  concerned  with  fixed  sample  statistical  problems,  es¬ 
pecially  change-point  problems,  which  involve  boundary  crossing  probabilities.  Examples  are 
given  of  problems  for  which  the  methods  of  Part  2  appear  adequate  and  of  problems  which 
require  new  methods. 


Boundary  Crossing  Probabilities  and  Statistical  Applications 

David  Siegmund 
Stanford  University 

0.  Introduction 

Let  X(t),  t  =  1,2, •••  or  0  <  t  <  oo,  be  a  stochastic  process  and  let  e(t)  be  con¬ 
stants.  The  general  subject  of  this  paper  is  approximate  computation  of  boundary  crossing 
probabilities  of  the  form 

(0.1)  P{X(t)  >  e(t)  for  some  mo  <  t  <  m} 

or 

(0.2)  P{X(t)  >  e(t)  for  some  mo  <  t  <  m  |  X(m)  =  f} 

and  statistical  applications  of  the  resulting  approximations. 

The  grandfather  of  all  such  problems  in  statistics  is  to  determine  the  distribution  of 
the  one  sample  Kolmogorov-Smirnov  statistic,  which  is  of  the  form  (0.1)  with  X(t)  the 
difference  between  the  empirical  and  true  distribution  function  of  a  random  sample  and 
c(t)  identically  constant.  The  limiting  distribution  of  this  statistic  is  of  the  form  (0.2) 
with  X(t)  a  Brownian  motion  process,  e(t)  identically  constant,  and  £  =  0.  The  principal 
contemporary  motivation  for  studying  such  problems  comes  from  sequential  analysis,  which 
is  the  context  in  which  many  of  the  results  discussed  below  first  arose. 

The  paper  is  divided  into  three  parts.  The  first  is  concerned  with  a  class  of  problems 
in  sequential  analysis  which  lead  naturally  to  problems  of  the  form  (0.1)  and  (0.2).  The 
discussion  in  Part  1  is  restricted  to  statistical  issues.  We  shall  in  effect  assume  that  we 
can  compute  without  difficulty  various  boundary  crossing  probabilities  that  arise.  How¬ 
ever,  these  problems  motivate  the  second  part  of  the  paper,  which  is  concerned  with  the 
mathematical  problem  of  approximation  of  boundary  crossing  probabilities.  The  third  part 
discusses  a  number  of  non-sequential  statistical  problems  which  also  lead  to  boundary  cross¬ 
ing  probabilities,  some  of  which  are  essentially  already  solved  in  Part  2,  and  some  of  which 


require  the  development  of  new  methods.  Particular  attention  is  given  to  so-called  “change 
point”  problems.  The  results  here  are  in  some  respects  less  complete  and  outline  a  program 
for  future  research. 

Because  the  subject  of  boundary  crossing  probabilities  is  quite  technical,  to  convey  the 
main  ideas  the  following  discussion  is  frequently  heuristic  and  restricted  to  special  cases. 
References  are  given  to  mathematically  rigorous  treatments.  I  have  written  a  monograph 
on  sequential  analysis  (Siegmund,  1985),  which  describes  in  substantially  more  detail  most 
of  the  results  of  Parts  1  and  2. 


1.  Sequential  Analysis 

The  primary  impetus  for  the  development  of  sequential  analysis  during  the  1940’s  was 
a  desire  for  more  efficient  methods  of  sampling  inspection.  Recent  developments  have  been 
motivated  at  least  in  part  by  ethical  considerations  in  the  design  of  clinical  trials. 

1.1  Repeated  Significance  Tests  for  Normal  Data 

We  shall  consider  in  detail  the  following  very  simple  model  of  a  clinical  trial.  In  order 
to  compare  two  treatments,  A  and  B,  patients  arrive  sequentially  and  are  paired,  with  one 
patient  in  each  pair  receiving  treatment  A  and  the  other  treatment  B.  Let  a,-  ((,-)  denote 
the  (immediate)  response  of  the  patient  in  the  *th  pair  who  receives  treatment  A  (B),  and 
let  z,  =  a,-  -  6,\  Assume  that  zltzj,  •  •  •  are  independent  and  normally  distributed  random 
variables  with  mean  and  known  variance,  which  without  loss  of  generality  can  be  assumed 
equal  to  1.  Our  primary  goal  is  to  test  the  hypothesis  of  no  treatment  effect,  Ho  :  ft  =  0, 
against  the  alternative  Hi  :  p  ^  0.  Of  course  the  standard  fixed  sample  size  test  (at  level 
.05)  based  on  a  sample  of  size  n  is  to  compute  S„  =  xi+ — f  z„  and  reject  Ho  if  |Sn|  >  6'n1/2 
(6'  =  1.96).  — 

If  n  is  considerably  different  from  0,  indicating  that  one  of  the  two  treatments  is 
considerably  superior  to  the  other,  it  is  desirable  to  ascertain  this  fact  with  a  minimum 
amount  of  experimentation,  so  that  all  future  patients  can  receive  the  (apparently)  superior 
treatment.  On  the  other  hand  if  ft  is  about  equal  to  0,  there  is  no  ethical  mandate  (although 
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there  may  be  a  financial  one)  to  stop  sampling  as  soon  as  possible.  A  sequential  test  designed 
to  stop  sampling  as  soon  as  it  is  apparent  that  H\  is  true  while  behaving  like  a  fixed  sample 
test  if  Ho  appears  to  be  true  is  the  so-called  repeated  significance  test,  defined  as  follows 
(cf.  Armitage,  1975). 

Given  mo,  m,  and  6  >  0,  define  the  stopping  rule 

(1.1)  T  =  inf{n  :  n  >  mo,  |5„|  >  bn1/*}. 

Stop  sampling  at  min(T,  m)  and  reject  Ho  if  and  only  if  T  <  m.  The  power  of  this  test  is 

(1.2)  P,{T  <  m)  -  P,  (  Q  {IS.I  >  to1'1} 

\n=m0 

Its  expected  sample  size  is 

(1.3)  fip(min(r,m)|, 

which  we  anticipate  will  be  small  when  |/i|  >  0  and  about  equal  to  m  when  p  —  0. 
Remark:  Note  that  the  stopping  rule  (1.1)  can  be  written 

T  =  inf (n  :  n  >  mo,  5*/2n  >  b2/ 2}, 

or 

(1.4)  T  =  inf{n  :  n  >  mo,  An  >  a}, 

where  An  is  the  log  likelihood  ratio  statistic  for  testing  p  —  0  against  p  0,  and  a  =  b2/ 2. 
This  observation  is  very  helpful  in  adapting  the  results  developed  here  to  different  situations. 
For  example,  if  we  drop  the  hypothesis  that  the  variance  of  the  z’s  is  known,  (J-4)  becomes 

T  =  inf{n  :  n  >  mo,  (n/2)  Iog(l  *2Jv2)  >  o}, 

where  2n  =  *»'  vn  —  n_1  ^? (*»'  “  *»)*•  Much  of  the  theory  developed  for  tests 

based  on  (1.1)  in  normal  families  can  be  adapted  to  tests  defined  by  (1.4)  in  multiparameter 
exponential  famliles  (Woodroofe,  1978,  Lalley,  1983,  Hu,  1985). 

In  large  multi-center  clinical  trials  it  does  not  appear  feasible  to  monitor  the  accumu¬ 
lating  data  continuously, so  it  is  convenient  to  consider  also  "group”  repeated  significance 


tests,  in  which  we  suppose  that  each  “observation*  x,  is  actually  the  sum  of  several,  say 
k,  observations  which  constitute  the  «th  group.  This  does  not  affect  the  theoretical  de¬ 
velopments  that  follow,  since  x,  is  still  normally  distributed  (and  may  be  approximately 
normally  distributed  even  if  the  individual  observations  are  not),  but  it  does  mean  that  a 
small  value  of  the  parameter  m  can  represent  a  large  sample  size  if  the  group  size  k  is  large. 
Also,  for  a  group  sequential  test  with  group  size  k  the  “real”  expected  sample  size  is  k 
times  the  quantity  in  (1.3).  As  we  shall  see  below,  the  group  size  k  typically  does  not  have 
a  significant  effect  on  the  operating  characteristics  of  a  sequential  test.  See  also  Pocock 
(1977). 

Before  presenting  a  numerical  example,  it  is  convenient  to  introduce  a  modification  of 
the  repeated  significance  test  defined  above,  which  does  have  an  important  impact  on  its 
operating  characteristics.  As  Table  1  below  illustrates,  a  repeated  significance  test  can  have 
a  much  smaller  expected  sample  size  for  large  )/t|  than  a  fixed  sample  test  of  sample  size  m, 
but  the  price  it  pays  is  a  considerable  loss  of  power.  To  recapture  most  of  this  lost  power 
at  a  relatively  small  increase  in  the  expected  sample  size,  consider  the  following  family  of 
tests  which  interpolate  between  fixed  sample  tests  and  repeated  significance  tests.  Given 
0  <  e  <  b,  let  T  be  defined  by  (1.1).  Stop  sampling  at  min(r,m)  and  reject  Ho  if  either 
T  <  m  or  T  >  m  and  |Sm|  >  cm1/*.  The  power  of  this  test  is 

P,{T  <m}  +  P,{T  >  m,  \Sm\  >  cm1/*} 

(1.5) 

=  /VflSJ  >  cm}  +  P„{T  <  m,  |S„|  <  cm1/J}. 

Of  course  the  value  of  b  must  now  be  somewhat  larger  than  previously  if  the  overall  sig¬ 
nificance  level  is  to  be  unchanged,  but  by  taking  e  only  slightly  larger  than  the  rejection 
level  of  a  fixed  sample  test,  one  makes  the  power  essentially  equal  to  the  first  term  on  the 
right  hand  side  of  (1.5),  which  in  this  case  is  about  the  same  as  for  the  fixed  sample  test. 
See  Figure  1  and  Table  2  below.  This  modification  of  the  repeated  significance  test  was 
suggested  independently,  with  varying  motivation,  by  Haybittle  (1971),  Peto,  et  a I.  (1976), 
and  Siegmund  (1978). 

Tables  1-3  contain  numerical  examples.  The  power  function  has  been  approximated 
according  to  the  suggestions  in  Part  2  of  this  paper.  For  comparison,  results  obtained  by 


Pocock  (1977)  by  iterative  numerical  integration  are  included  in  parentheses,  when  available. 
Approximations  to  the  expected  sample  size  are  computed  according  to  the  suggestion  of 
Siegmund  (1985,  equation  (4.42)),  which  is  not  discussed  here.  Those  cells  which  contain  an 
asterisk  are  combinations  of  b,  m,  and  n  for  which  the  approximation  to  the  expected  sample 
size  is  poor.  Table  1  is  concerned  with  a  repeated  significance  test  having  power  function 
given  by  (1.2).  It  is  easy  to  see  that  there  is  a  substantial  savings  in  the  expected  sample 
size  when  |/*|  >  0  compared  to  a  fixed  sample  test  taking  m  observations.  To  document  the 
loss  of  power  of  the  repeated  significance  test,  the  power  of  a  fixed  sample  tHt  taking  m 
observations  is  also  included  in  the  table.  Table  2  is  concerned  with  a  modified  test  having 
power  function  given  by  (1.5).  Now  there  is  essentially  no  loss  of  power,  but  still  a  quite 
considerable  savings  in  the  expected  sample  size.  In  order  to  compare  Table  3  with  Table  2, 
one  should  think  of  Table  2  as  defining  a  group  sequential  test  with  k  =  10  observations  per 
group.  Then  the  values  given  for  /i  in  the  two  tables  are  comparable  (i.e.  a  value  in  Table  3 
equals  the  corresponding  value  in  Table  2  divided  by  k1/*  =  3.16);  and  the  expected  sample 
sizes  are  comparable  if  one  multiplies  the  entries  in  Table  2  by  k  =  10.  To  the  accuracy 
of  the  approximations  used,  the  group  test  has  the  same  power  function  and  just  a  slightly 
larger  expected  sample  size  than  the  test  which  inspects  the  data  continuously. 

Table  1 

Repeated  Significance  Test 

6  =  2.413,  mo  =  l,  m  ■  5,  o  g  .049  (.05) 
ft  Power  (1.2)  Ep{T  A  m)  Power  of  Fixed 

Sample  Test 

2^071  .99  (.99)  1.93  (2.05)  Too 

1.759  .95  (.95)  2.43  (2.53)  .98 

1.592  .91  (.90)  2.76  (2.84)  .95 

1.311  .76  (.75)  3.35  (3.41)  .83 

.994  .52  (.50)  4.02  (4.04) _ .60  — 

Parenthetical  entries  from  Pocock  (1977) 


Table  2 

Modified  Repeated  Significance  Test 
b  =  2.7,  e  —  2.04,  mo  =  1,  m  =  5,  a  2  .0504 


M 

Power  (1.5) 

Ep(T  A  m) 

P„{T  <  m} 

2.071 

1.00 

2.26 

.98 

1.759 

.97 

2.83 

.91 

1.592 

.94 

3.18 

.84 

1.311 

.82 

3.76 

.66 

.994 

.59 

* 

.40 

Table  3 

Modified  Repeated  Significance  Test 

6  = 

2.91,  e  = 

2.05,  mo  =  10, 

m  =  50,  a  3  .0503 

M 

Power 

Ep{T  A  m) 

P,{T  <  m} 

.655 

19 

.97 

.556 

.97 

25 

.89 

.94 

29 

.81 

.415 

.82 

35 

.62 

.314 

.59 

* 

.36 

Remark  1.6.  It  is  easy  to  devise  other  tests  which  behave  about  the  same  as  the  modified 
repeated  significance  test  discussed  here.  One  possibility,  suggested  independently  by  Miller 
(1970),  Samuel-Cahn  (1974),  and  O’Brien  and  Fleming  (1979),  is  to  stop  at  min(/V, m), 
where  N  =  min{n  :  |Sn|  >  B),  and  to  reject  £To  :  p  =  0  if  N  <  m.  While  the  properties 
of  these  tests  are  similar  to  those  of  the  modified  repeated  significance  tests  defined  above, 
they  appear  to  have  some  disadvantages.  For  example,  their  overall  significance  level  is  more 
sensitive  to  the  choice  of  m,  which  makes  them  less  flexible  in  adjusting  to  an  unanticipated 
change  in  the  maximum  sample  size  once  an  experiment  has  begun.  See  Siegmund  (1985). 

A  modified  repeated  significance  test  is  designed  to  produce  a  fixed  sample  size  m 
unless  there  is  a  substantial  treatment  effect  as  measured  by  the  parameter  of  primary 
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interest,  ft.  We  assume  that  if  |/i|  is  large,  the  preference  for  one  treatment  is  so  strong  that 
other  considerations  are  essentially  irrelevant.  However,  there  typically  are  other  measures 
of  treatment  effect  which  one  wants  to  explore,  especially  if  p  S*  0;  but  because  of  their 
secondary  importance  they  do  not  enter  into  the  definition  of  the  stopping  rule.  There  are 
undoubtedly  also  cases  where  if  n  is  close  to  0,  one  would  like  to  terminate  the  experiment 
as  soon  as  possible  because  of  economic  considerations. 

One  can  easily  obtain  reasonable  tests  which  provide  for  early  termination  when  Ho 
appears  to  be  true  by  splicing  together  “one-sided”  tests.  For  example,  we  consider  initially 
a  modified  repeated  significance  test  of  flo  :  n  =  0  against  Hi  :  ft  >  0  defined  by  the 
stopping  time 

T\  =  inf{n  :  n  >  r»oi,  S„  >  bin1/ 2} 

with  rejection  of  Hq  if  Tt  <  m  or  Ti  >  m  and  Sm  >  cm1/7.  Now  consider  adding  a  lower 
stopping  boundary 

(1.7)  Tt  =  inf{n  :  n  >  mo2,  Sn  <  -bin1/7  +  £n}  (£  >  0), 

and  define  a  new  test  which  stops  sampling  at  Ti  AT*  Am  with  rejection  of  Ho  if  TV  <  Ti  Am 
or  T\  A  Ti  >  m  and  Sm  >  cm1/7.  (Here  we  are  assuming  that  —him1/7  +  6m  <  cm1/7.) 
Presumably  6  is  chosen  to  be  a  positive  treatment  effect  which  it  is  important  to  detect. 
Since  one  hopes  to  accomplish  something  different  with  Ti  than  with  Ti  thereis  no  obvious 
reason  that  the  lower  boundary  should  have  the  same  shape  as  the  upper  boundary,  or  if  it 
has,  that  bx  and  should  have  any  particular  relation.  Nevertheless,  for  the  convenience 
of  this  theoretical  discussion,  we  assume  that  moi  =  mot  —  mo  and  frj  =  *=  b,  say. 

The  power  of  this  test  is 

(1.8)  P*{T i  <  Ti  A  m}  +  Pp{Ti  A  Ti  >  m,  Sm  >  cm1/7}, 

which  is  difficult  to  compute  exactly,  but  which  usually  is  easily  approximated  by  results 
developed  to  deal  with  (1.5).  One  approximation  to  (1.8)  is 

(1.9)  P„{sm  ^  cm1/7}  +  P„{Ti  <m,  Sm<  cmlf7}  -  P„{r2  <  m,  Sm  >  cmlf7}. 
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It  may  be  shown  that  the  difference  between  (1.8)  and  (1.9)  is 

P^Ti  <  2j  <  m,  Sm>  em1/2}  -  <  Tt  <  m,  Sm  <  eml/2}, 

which  involves  sample  paths  which  Erst  cross  one  stopping  boundary,  then  the  other,  and 
have  partially  crossed  the  continuation  region  again  by  time  m.  These  probabilities  are 
usually  insignificantly  small  unless  m  is  close  to  the  point  where  the  upper  and  lower 
boundaries  meet,  in  which  case  m  can  probably  be  reduced  without  adversely  affecting 
the  overall  properties  of  the  test  (cf.  Anderson,  1960).  For  the  somewhat  simpler  case 
of  a  truncated  sequential  probability  ratio  test,  Siegmund  (1985,  III.6)  shows  that  the 
corresponding  approximation  is  a  good  one. 

For  a  numerical  example  consider  the  test  of  Table  2,  which  has  a  significance  level  of 
about  .025  as  a  one-sided  test  against  H\  :  p  >  0.  If  we  now  introduce  a  lower  stopping 
boundary  (1.7)  with  6  =  1.759  and  decrease  e  slightly  to  2.02,  the  approximation  (1.9) 
indicates  that  the  significance  level  of  the  new  test  is  again  about  .025  and  the  power  at 
p  =  1.759  is  still  .97.  At  p  =  .994  the  power  is  about  .58,  so  introduction  of  the  lower 
stopping  boundary  appears  to  lead  to  a  negligible  loss  of  power.  On  the  other  hand,  the 
expected  sample  size  when  p  =  0  is  roughly  the  same  as  the  expected  sample  size  in  Table 
2  for  p  =  1.759,  or  about  2.8.  This  is  a  considerable  reduction  from  the  expected  sample 
size  of  a  repeated  significance  test,  which  is  just  slightly  less  than  the  maximum  sample 
size,  m  —  5. 

In  recent  years  various  authors  have  attempted  to  define  attained  significance  levels, 
or  p-values,  and  confidence  sets  relative  to  sequential  tests.  Both  of  these  concepts  require 
that  the  possible  outcomes  be  ordered  so  that  one  knows  what  it  means  tn  say  that  one 
outcome  is  more  extreme  than  another.  For  example,  suppose  that  we  use  the  stopping  rule 
(1.1)  and  the  test  terminates  at  T  =  n  €  (mo,  m).  It  seems  reasonable  to  say  that  a  more 
extreme  result  would  be  a  sample  outcome  which  terminates  at  this  or  a  smaller  value  of  T 
and  hence  to  define  the  (two-sided)  attained  significance  level  of  the  observed  result  to  be 
P0{T  <  n}.  By  similar  reasoning,  one  can  define  a  confidence  interval  for  p.  For  a  lower 
(1  -  a)  100%  confidence  bound,  if  T  =  n  €  (mo,  m]  and  St  >  0,  we  can  take  for  a  bound 


that  value  p*  which  satisfies 


P, ,„{T  <n,  St>  0}  =  a. 

The  bound  is  defined  similarly  if  T  =  mo,  T  >  m,  etc. 

For  6  =  2.413  as  in  Table  1,  the  attained  significance  of  T  =  2  according  to  the 
preceding  definition  is  about  .027.  Thus,  in  spite  of  the  dramatic  action  of  stopping  the  test 
after  40%  of  its  projected  duration  the  evidence  against  Hu  as  measured  by  the  p-  value  is  by 
no  means  dramatic.  The  situation  is  somewhat  different  for  a  modified  repeated  significance 
test  if  6  is  taken  sufficiently  large.  For  b  =  2.71  as  in  Table  2,  Po{T  <  2}  S  .012;  and  the 
attained  significance  of  any  result  which  terminates  the  test  before  time  m  =  5  is  bounded 
by  Po{T  <  5}  S*  .023.  Of  course,  it  would  defeat  the  purpose  of  using  a  sequential  test  if 
one  insisted  that  the  p- value  be  extremely  small  before  stopping  the  experiment. 

It  seems  difficult  to  give  a  persuasive  theoretical  justification  for  the  definitions  sug- 
gested  here,  and  hence  the  principal  argument  in  support  of  them  is  that  several  authors 
have  independently  arrived  at  essentially  the  same  definitions.  Berk  and  Brown  (1978) 
discuss  different  alternatives.  One  is  to  order  sample  outcomes  according  to.the  value  of 
STAm/TAm.  If  one  neglects  excess  over  the  stopping  boundary,  this  definition  is  equivalent 
to  the  one  suggested  above.  However,  it  has  the  advantage  that  it  generalises  directly  to 
the  case  in  which  additional  data  become  available  after  the  experiment  has  terminated. 
Usually  these  data  are  a  small  part  of  the  total  sample  and  have  an  insignificant  effect  on 
the  analysis.  See  Samuel-Cahn  and  Wax  (1985)  for  an  interesting  example  to  the  contrary. 
Siegmund  (1985)  contains  additional  references  and  a  more  complete  discussion. 

1.2  Sequential  Survival  Analysis 

The  discussion  of  the  preceding  section  is  extremely  simplified,  and  to  see  how  it  pro¬ 
vides  considerable  insight  for  more  realistic  models,  we  consider  next  the  possibility  of  using 
sequential  methods  in  clinical  trials  involving  survival  data,  analyzed  by  the  proportional 
hazards  model  (Cox,  1972).  The  notation  is  unavoidably  complicated. 

Suppose  that  patients  arrive  (are  bom)  at  times  Vi,V2,  •  •  Associated  with  the  tth 
patient  is  a  triple  z,-,  c,  ),  where  z,-  is  a  covariate,  z,-  is  the  length  of  survival  (age  at  death), 


j 


and  Ci  denotes  the  time  of  censoring  after  arrival.  The  assumption  of  the  proportional 
hazards  model  is  that 


P{xi  6  (s,  *  +  da)  |  Zi  >  s}  =  dAi(s)  =  txp(0Zi)\(»)da, 

for  some  unknown  parameter  0  and  base  line  hazard  function  X.  Also  let  R(t,  a)  =  {*  :  y,  < 
t  -  a,  Zi  A  ei  >  a}  denote  those  patients  who  are  at  risk  at  time  t  and  whose  age  (measured 
from  arrival)  is  at  least  s.  Let 

Ni(t,z)  «  I{yi  +  zi<t ,  zi  <  ei,  zi  <  «} 


be  the  indicator  that  the  »th  patient  arrived  and  died  before  time  t,  died  at  an  age  <  a, 
and  was  not  censored  at  the  epoch  of  death.  Cooc  (1972,  1975)  suggested  that  this  model 
be  analyzed  by  applying  likelihood  methods  to  the  log  “partial*  likelihood  function 


Ite-Ios  £  «p W'i) 

<  l  J  J 


In  particular,  consider  the  score  process  t{t,0),—  dt{t,0)/d0t  or  more  generally,  the  two 
parameter  process 


(110)  t(t,  a,  0)  =  Yi  f  {*.  ~  Mj(*>  «)}tf<(*.  dti) 

i  'M 


with 

£/€K(«,«)  exP(fizi) 

•  • 

It  is  easy  to  see  that  t(t,0)  —  t(t,  t,0).  The  score  process  can  be  used  directly  to  test  the 
hypothesis  Ho  :  0  —  0o,  and  its  zeroes  yield  partial  maximum  likelihood  estimators  of  0. 
The  asymptotic  distribution  theory  used  for  probability  calculations  is  based  on  the  fact 
that  under  mild  regularity  conditions 


hmn-k*,  m11’ 


has  asymptotically  a  standard  normal  distribution.  (Here  t(t,0)  is  the  second  derivative  of 
the  log  partial  likelihood  with  respect  to  0.)  See  Cox  (1975)  for  an  informal  treatment  and 
Gill  (1980)  for  a  sophisticated  discussion  based  on  martingale  theory. 


The  appropriate  generalization  for  purposes  of  sequential  analysis  is  that  l(t,0),  when 
plotted  against  -l(t,0)  as  the  “time”  parameter,  behaves  like  standard  Brownian  motion. 
By  virtue  of  the  Taylor  series  expansion 

t{t,fa)  =  i{*J)  +  (0~  0o){-t(t,0o)}  +  o{fi  -  fa), 

we  see  that  for  0  close  to  fa,  the  test  statistic  t{t,fa)  plotted  against  {-l(t,fa)}  as  time 
behaves  like  Brownian  motion  with  drift  0  —  fa  when  0  is  the  true  value  of  the  parameter. 
See  Figure  2. 


Formulating  precisely  and  proving  the  claims  of  the  preceding  paragraph  are  a  sub¬ 
stantial  undertaking,  which  is  not  attempted  here.  Sellke  and  Siegmund  (1983)  give  a  fairly 
complete  discussion  under  the  additional  assumption  that  the  triples  z,-,  c,)  are  indepen- 


dently  and  identically  distributed.  A  still  more  difficult  argument  is  required  if,  as  seems 
desirable,  one  treats  the  z’s  and  e’s  as  ancillary  and  conditions  on  their  values  (Sellke,  1985). 

The  reason  that  it  is  much  more  difficult  to  study  the  score  function  as  a  process  in  t 
than  marginally  for  fixed  t  is  briefly  the  following.  By  rewriting  (1.10)  as 

t(t,  =  <*«)  “  /{*  €  R(t ,  u)}dA,(u)), 

<J  M 

one  can  easily  show  that  (1.10)  is  a  martingale  in  s  for  each  fixed  t.  Hence  martingale  central 
limit  theory  is  tailor-made  to  study  the  behavior  of  (1.10)  as  a  process  in  «  and  in  particular 
its  marginal  distribution  for  «  =  t.  However,  (1.10)  with  »  =  t  is  not  in  general  a  martingale 
in  t  (although  it  is  in  the  degenerate  case  that  all  arrival  times  y,  are  the  same).  Sellke 
and  Siegmund  (1983)  show  that  i{t,  t,0)  can,  however,  be  approximated  by  a  martingale 
uniformly  in  t ;  and  they  then  apply  martingale  central  limit  theory  to  this  approximating 
martingale.  Sellke  (1985)  observes  and  exploits  the  fact  that  for  tj  <  ti  <  t*-<  U 

are  orthogonal  martingales  in  s. 

It  is  customary  for  data  monitoring  committees  to  meet  at  roughly  equal  intervals  of 
real  time  (e.g.  every  six  months).  According  to  the  central  limit  theory  discussed  above, 
if  one  proposed  to  apply  the  methods  developed  in  the  preceding  section,  intervals  of  time 
should  not  be  measured  by  equal  calendar  intervals,  but  by  equal  intervals  of  increase 
in  the  observed  Fisher  information,  {-£(<,£)}.  Siegmund  (1985)  describes  a  Monte  Carlo 
experiment,  which  among  other  things  indicates  that  this  discrepancy  may  not  be  important 
-  at  least  if  the  arrival  and  censoring  mechanisms  are  not  too  erratic. 

1.3  Example. 

A  sequential  clinical  trial  which  has  recently  been  described  in  considerable  detail 
in  the  medical-statistical  literature  is  the  randomised  trial  of  propranolol  conducted  by 
the  ^-Blocker  Heart  Attack  Trial  Research  Group  (cf.  BHAT,  1982,  and  DeMets,  Hardy, 
Friedman,  and  Lan,  1984).  Over  a  period  of  about  twenty-seven  months  3837  victims  of 
acute  myocardial  infarction  were  randomised  to  a  placebo  group  (1921)  or  a  treatment 
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group  (1916).  The  principal  endpoint  was  a  survival  time,  which  was  assumed  to  follow  a 
proportional  hazards  model. 


The  data  monitoring  committee  planned  reviews  of  the  results  to  date  at  t  =  1,  1.5, 
2,  2.5,  3,  3.5,  and  4  years.  It  was  assumed  that  these  would  correspond  to  seven  reviews 
at  approximately  equally  spaced  increments  of  increase  in  the  observed  FisheHnformation. 
The  stopping  rule  used  in  this  experiment  was  defined  by  parallel  straight  lines  as  in  Remark 
1.6,  but  for  illustrating  the  theory  developed  here,  we  shall  consider  a  repeated  significance 
test.  (For  a  discussion  of  the  various  factors  in  addition  to  the  "stopping  rule”  which  went 
into  the  actual  decision  to  terminate  the  experiment,  see  DeMets  et  a/.,  1984.) 

Let  tn  denote  the  time  of  the  nth  planned  inspection,  ns  1, 2,  •  •  • ,  7,  and  consider  the 
stopping  rule 


(1.11) 


T  =  mf{t„  :  n  >  mo,  |/(t*,0)|  >  Ij-^.O))1/2}. 


To  test  the  hypothesis  Hq  :  fi  =  0  of  no  treatment  effect,  stop  sampling  at  min(T,  t?)  and 
reject  Ho  if  either  T  <  tj  or  T  >  *7  and  |£(tr,  0)|  >  The  normal  approximation 

described  above  indicates  that  mo  *  2,  m  *  7,  b  **  2.65,  and  e  =  2.05  yield  a  .05  level  test 
having  a  power  function  very  close  to  that  of  the  sequential  design  used  in  BHAT  (1982). 
The  power  function  and  approximate  expected  sample  size  for  this  test  in  the  approximating 
normal  model  are  given  in  Table  4. 


Table  4 


Repeated  Significance  Test  for  Normal  Data 
4  =  2.65,  c  =  2.05,  mo-2,m=7 


Sample  Size 


3.59 


To  relate  the  power  as  a  function  of  the  normal  mean  n  to  the  parameter  f),  it  is 


necessary  to  make  some  assumptions  about  the  rate  of  increase  of  -t(t,  0).  For  the  simple 
model  we  have  discussed,  if  the  censoring  mechanism  does  not  depend  on  the  covariate, 
it  is  easy  to  see  that  for  ft  close  to  0  each  death  yields  on  the  average  about  J  unit  of 
information.  In  the  3.5  years  before  this  experiment  was  terminated  there  were  318  deaths 
for  an  accumulation  of  approximately  79.5  units  of  information,  or  an  average  of  13.3  units 
per  inspection  period.  This  means  that  a  value  of  ft  in  Table  4  corresponds  roughly  to  a 
value  of  0  —  /*/(  13.3)1/2.  In  particular  the  row  for  p  =  1.25  in  Table  4  corresponds  to 
0  about  equal  to  .34.  (The  discussion  of  sample  size  selection  in  BHAT,  1981,  shows  the 
expectation  before  the  experiment  began  of  a  somewhat  more  rapid  rate  of  accumulation 
of  information,  hence  greater  power,  than  actually  occurred.) 

Similarly,  the  expected  sample  size  in  Table  4  multiplied  by  an  average  rate  of  accumu¬ 
lation  of  information  gives  the  expected  information  until  termination  of  the  experiment. 
This  in  itself  may  not  be  as  meaningful  as,  for  example,  the  expected  number  of  deaths  or 
the  expected  real  time  of  the  experiment.  If  we  use  the  approximation  that  information 
equals  one  fourth  the  number  of  deaths,  then  expected  information  is  directly  proportional 
to  a  more  meaningful  quantity.  Without  much  stronger  modeling  assumptions,  involving 
the  arrival  rate  and  the  baseline  hazard  function,  there  is  no  relation  between  expected 
information  and  expected  real  time  for  the  experiment.  Qualitatively,  information  accumu¬ 
lates  more  slowly  early  in  the  experiment,  so  a  reduction  in  expected  information  of  50% 
compared  to,  say,  a  fixed  sample  test,  invariably  means  a  smaller  reduction  in  the  expected 
time  of  the  experiment. 

The  observed  values  of  l/(-i)1/2  at  1,1.5,--*,  3.5  years  were  respectively  1.68,  2.24, 
2.37,  2.30,  2.34,  2.82  (DeMets,  et  a L,  1984).  For  the  test  actually  used  and  also  for  the 
repeated  significance  test  suggested  above,  these  data  lead  to  termination  of  the  experiment 
at  t  =  3.5  years,  or  six  months  before  the  final  planned  inspection.  (A  more  detailed  analysis 
using  a  number  of  covariates  gave  the  value  3.05  for  the  corresponding  normalized  statistic, 
which  is  reasonably  consistent  with  the  2.82  of  the  simplest  possible  model.  See  BHAT, 
1982.)  The  values  of  f(t,0)  and  -f(t,0)  are  not  given  separately,  so  it  is  not  possible  to 
plot  t  against  -l  as  in  Figure  2.  This  is  unfortunate  because  such  a  plot  would  allow  one 
to  check  whether  inspections  indeed  occur  at  approximately  equal  increments  of  increase  of 
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information;  and  much  more  importantly,  since  the  plot  should  be  approximately  a  straight 
line  with  slope  0,  it  would  give  a  visual  estimate  of  0  and  a  visual  goodness' of  fit  test  of 
the  proportional  hazards  model.  See  Siegmund  (1985)  for  a  hypothetical  reconstruction  of 
—i  based  on  the  assumption  that  each  death  contributes  one  fourth  unit  of  information  and 
calculation  of  a  confidence  interval  for  0. 

According  to  the  definition  of  attained  significance  given  in  Section  1.1,  the  two-sided 
p-value  of  T  =  6  in  the  approximating  normal  model  is  Po{T  <  6}  a*  .023.  The  attained 
level  announced  in  BHAT  (1982)  is  .01,  which  seems  to  be  too  small  -  even  if  one  takes  into 
account  the  different  stopping  rule  and  the  possibility  of  slight  variations  in  the  definition 
of  the  p-value. 


2.  Boundary  Crossing  Probabiities 

2.1  Introduction  and  Asymptotic  Normalisation 

In  this  section  we  consider  the  mathematical  problem  of  calculating  approximately 
probabilities  like  (1.5). 

Let  •  •  •  be  independent  and  identically  distributed,  and  set  Sn  —  x\  +  •  •  •  +  x„. 

Let  c(n),  n  =  1, 2,  •  •  •  be  constants  and  m<j  <  m  positive  integers.  Define  the  stopping  time 

T  =  inf {n  :  n  >  mo,  Sn  >  c(n)}, 

and  consider  the  problem  of  evaluating 

(2.1)  P{T  <  m} 
or 

(2.2)  P{T  <  m  |  Sm  =  O- 

Since  P{T  <  m)  =  P{Sm  >  e(m)}  +  P{T  <  m  \  Sm  =  (}P{Sm  €  d(),  and  since 
the  distribution  of  Sm  is  comparatively  easy  to  evaluate,  at  least  approximately,  a  good 
approximation  to  (2.2)  usually  yields  a  good  approximation  to  (2.1).  Similarly,  evaluation 
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of  (2.2)  is  frequently  the  principle  ingredient  in  calculating  (1.5).  Hence  our  focus  in  what 
follows  is  on  developing  approximations  to  (2.2),  which  occasionally  is  of  interest  in  itself. 

Since  (2.2)  can  only  rarely  be  evaluated  exactly,  it  is  convenient  to  imbed  our  problem 
in  a  sequence  of  problems  and  seek  an  asymptotic  approximation.  The  actual  calculations 
are  preceded  by  some  remarks  about  the  two  most  obvious  asymptotic  formulations. 

In  problems  scaled  for  large  deviations,  we  consider  the  asymptotic  evaluation  as 
m  -*  oo  of  probabilities  of  the  form 

p(m)  =  P{S„  >  m  e(n/m)  for  some  mo  <  n  <  m  |  =  £},  (£  =  m£0)- 

Since  the  boundary  m  e(n/m)  is  0(m1/*)  standard  deviations  away  from  the  (conditional) 
mean  path  of  S„,  these  probabilities  typically  converge  to  zero,  and  a  reasonable  approx¬ 
imation  would  be  of  the  form  p(m)  ~  q(m)  for  some  easily  evaluated  analytic  expression 
g(m). 

An  alternative,  the  ordinary  deviation  or  diffusion  scaling,  suggests  consideration  of 

p'(m)  =  P{5„  >  mlf1c{njm)  for  some  mo  <  n  <  m  |  Sm  m  (£  =  m^fo)- 

Now  the  mean  path  of  Sn  is  0(1)  standard  deviations  from  the  boundary  ml^c(n/m),  so 
typically  if  mo/m  — » to 

p'(m) -*  p  *=  P{IF(t)  >  c(t)  for  some  to  <  t  <  1 1 1^(1)  =*  fo}» 

where  W(t ),  0  <  t  <  oo,  is  a  Brownian  motion  process.  The  approximation  of  p'(m)  by  p  is 
often  not  particularly  good,  but  it  can  be  improved  by  finding  an  expansion  of  the  form 

p'(m)  =  p  +  pj  m-1/2  +  o(m"^2), 

which  has  been  called  a  corrected  diffusion  approximation  (cf.  Siegmund,  1984,  and  refer¬ 
ences  cited  there). 

Typically,  large  deviation  approximations  are  more  easily  obtained  than  corrected 
diffusion  approximations.  This  is  especially  true  for  nonlinear  boundaries,  e(n).  See  Hogan 
(1984)  for  the  first  corrected  diffusion  approximations  in  a  nonlinear  case.  Occasionally  it 
is  possible  to  write  a  single  approximation  which  is  applicable  to  both  cases.  When  this  is 


so,  that  approximation  is  usually  a  very  good  one.  Except  for  a  few  remarks,  only  large 
deviation  scaling  is  considered  in  what  follows. 


Numerous  methods  have  been  invented  for  approximating  boundary  crossing  proba¬ 
bilities  (e.g.  Borovkov,  1962,  Woodroofe,  1976b,  Lai  and  Siegmund,  1977,  Daniels,  1974, 
Jennen  and  Lerche,  1981,  Durbin,  1981).  The  method  described  below  has  the  virtues  that 
it  is  essentially  the  same  in  both  discrete  and  continuous  time,  it  is  fairly  general,  and  it 
yields  exact  results  in  most  of  the  simple  situations  where  exact  results  can  be  obtained. 
Our  starting  point  is  a  derivation  of  the  standard  reflection  principle  for  Brownian  mo¬ 
tion.  The  argument  is  then  incrementally  modified  to  deal  with  problems  in  discrete  time 
and  problems  involving  nonlinear  boundaries.  Woodroofe  (1982)  contains  an  exposition  of 
alternative  methods  supported  by  complete  proofs. 

2.2  Reflection  Principle  for  Brownian  Motion 

Let  W(i),  0  <  t  <  oo,  be  Brownian  motion  with  drift  fi  and  unit  scale  parameter,  and 
let  7 (t)  denote  the  <r-field  of  events  defined  by  W(s),  0  <  a  <  t.  It  will  be  convenient  to 
use  the  notation 

=  P„M  I  »'(m)  -  (}  W6W. 

By  the  sufficiency  of  W(m),  this  conditional  probability  does  not  depend  on  /*.  For  all 
6  /  6  and  t  <  m,  the  probabilities  P^  and  Pj^  when  restricted  to  7{t)  are  mutually 
absolutely  continuous;  and  a  straightforward  calculation  shows  that  the  likelihood  ratio  of 
W($),  $  <  t,  under  P^  relative  to  is 

(2.3)  <•“»(<, »'(<);  fe) -«p{[(«i-6)*,<<)“(eJ -{?)]/(>»-<)}• 

The  following  is  a  version  of  Wald’s  likelihood  ratio  identity,  which  can  be  proved  by  stan¬ 
dard  martingale  arguments. 

Proposition  2.4.  For  any  &  #  fj,  m  >  0,  stopping  time  T  and  event  A  6  7{T) 

{T  <  m})  =  W(T);  6,6);  An  {r  <  m}], 


where  is  given  by  (2.3). 


Let  6  >  0,  -oo  <  ff  <  oo,  and  define 


r  =  inf{t :  W(t)  >  6  +  **}• 

Let  £i  =  (  <  b  +  ijm,  and  let  £ j  =  2(6  +  17m)  -  £  be  £  “reflected*  about  6  +  17m.  See  Figure 
3.  Since  W(r)  =  b  +  ijr  on  {r  <  m},  and  P^{r  <  m}  =  1,  from  Proposition  2.4  and 
simple  algebra  one  obtains  the  well-known  result 

(2.5)  P^{r  <  m}  =  exp[-26(6  +  17m  -  £)/mJ. 

Siegmund  and  Yuh  (1982)  show  how  a  slightly  more  sophisticated  version  of  this  ar¬ 
gument  yields  Anderson’s  (1960)  results. 

2.3  Correction  for  Discrete  Time 

Consider  now  the  same  problem  in  discrete  time,  so 

r  =  inf  {n  :  S„  >  6  +  r7n}, 

where  Sn  =  x\  +  — hx„,  and  under  P0  the  z’s  are  independent  normally  distributed  random 
variables  with  mean  ft  and  variance  1.  Now  the  preceding  argument  yields 

Pim^{r  <  m}  exp[26(6  +  r/m  -  f)/m] 

(*•«) 

=  [e*p{-2(6  +  nm~  f)l$r  ~b~  nr]/(m  -  r)};  r  <  m], 

where  &  =  2(6  +  17m)  -  f . 

To  analyze  the  right  hand  side  of  (2.6)  asymptotically,  suppose  that  6  =  fm  and 
(  =  m£o  for  some  fixed  f  >  0  and  fo  <  f  +  »7-  Since  the  Pj”*-deviations  of  S„  from  its 
expectations,  [2(f  + 17)  —  fo]n,  are  of  order  n1/2,  a  law  of  large  numbers  argument  shows 
that  with  probability  approaching  1,  S„  crosses  the  line  fm  +  r\ n  near  where  its  line  of  drift 
does,  so  for  any  e  >  0 

(2.7)  Jim^pjTm){|rn-tr  -  f(2f  +  n  -  £o)-1|  >  c}  =  0. 

See  Figure  3.  It  follows  that  the  right  hand  side  of  (2.6)  has  the  same  asymptotic  behavior 

as 

E}j’)(exp{-2(2f  +  ti-  (o)(Sr  -  fm  -  »yr);  r  <  m). 
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((2C>n-^0)_1;m,0) 

Figure  3 


m  ■  ■  ^  n 

(m,0) 


If  this  expectation  were  with  respect  to  the  unconditional  probability  with  the  same 
drift,  P2(f+n)_{0,  one  could  apply  the  renewal  theorem  in  the  manner  which  Feller  (1972, 
Chapter  XII)  uses  to  derive  Cramer’s  estimate  for  the  probability  of  ruin,  and  hence  evaluate 
(2.8)  in  the  limit  as  m  — ►  oo.  Specifically,  observe  that  for  a  random  walk  Sn,  n  —  1,2,**  *, 
with  non-negative  drift  p  =  ESi  and  for 

f  =  inf{n  :  Sn  >  a}, 

Sr  -  a  can  be  regarded  as  the  residual  lifetime  in  a  renewal  process  defined  by  ,  where 
f+  =  inf{n  :  S„  >  0).  Hence  if  Si  is  nonarithmetic  the  renewal  theorem  implies  that  as 
a-*  oo  — 

(2.8)  P{Sf  -a  <  *}  -  l*(^)r»  /* P{Sr+  >  y}dv. 

Jo 

See  Feller  (1972,  Chapters  XI  and  XII).  For  a  discussion  which  is  oriented  towards  the 
present  application,  see  Siegmund  (1985,  Chapter  VIII). 


During  the  relatively  short  time  interval  in  which  according  to  (2.7)  r  falls  with  proba- 


bility  close  to  one,  the  increments  to  the  conditional  P^  process  Sn  and  the  unconditional 
P2f+«j-f0  process  both  behave  essentially  the  same,  so  the  P^  and  P2(f+rj)_(0  limiting 
distributions  of  S,  -  (m  -  r jr  are  the  same,  and  are  given  by  (2.8)  with  5„  =  S„  -  rjn. 

One  simple  way  to  make  this  argument  precise  is  to  obtain  a  slightly  different  version  of 
(2.6)  by  using  Wald’s  likelihood  ratio  identity  to  differentiate  P^  with  respect  to  P2(f+nj_(0 
instead  of  P^.  Let  p?  =  (2/m  =  2(f +n)~(o-  An  easy  calculation  shows  that  the  likelihood 
ratio  of  atj,  •  •  • ,  x„  under  P^™*  relative  to  is 


exp 


-|(s-  -  «n)!/(m  - ")]  /  (*  -  ^)‘/1  ■ 


so  the  right  hand  side  of  (2.6)  equals 

E„t  |exp  j  -  2(f  +  q  -  Co  )(Sr  -b-  nr)/  (l  - 

-  ^(«r  -  M2r)2/(m  -  r)|/  (l  -  ;  r  <  m] . 


(2.9) 


The  asymptotic  marginal  distribution  of  the  random  variables  appearing  in  (2.9)  are 
easily  determined.  Under  r/m  converges  in  probability  to  the  same  limit  as  in  (2.7); 
the  renewal  theorem  applies  as  in  (2.8)  with  Sn  —  Sn- tin- b\  and  by  an  easy  application  of 
Anscombe’s  theorem,  (Sr  -  l^ir)/rll2  is  asymptotically  standard  normal.  Also  by  Lemma 
2.16  below,  ST  -  nr- b  and  (5,  -  are  asymptotically  independent.  Thus  we  have 

all  the  ingredients  to  evaluate  (2.9).  From  (2.8)  and  some  calculation  one  sees  that  the  limit 
of  (2.9)  is  j/(2(2f  +  n  -  Co)],  where  for  fi  >  0  and  r+  =  inf{n  :  Sn  >  0} 


(2.10)  v{n)  =  [1  -  exp( -n  Sr+  )]/p  P„/2(Sr+ ). 
Hence  by  (2.6) 

(2.11)  P^m)(r  <  m}  ~  i/(2(2f  +  n  -  Co)]e*p(“2mf(f  +  n  -  Co)] 


(6  =  mf,  (  =  m(o,  f  >  0,  Co  <?  +  *?)•  Random  walk  theory  (cf.  Feller,  1972,  Chapter  XVIII 
or  Siegmund,  1985,  Chapter  VIII)  permits  one  to  obtain  a  numerically  calculable  expression 


where  ♦  is  the  standard  normal  distribution  function.  For  many  purposes  it  suffices  to  use 
the  approximation 

v  {fi)  =  exp( -/>/*)  +  o(M2)  (M  —  0), 

P  —  Eq  S*J2  Eo(Sr+)  =  -X-1  /°V2log[2A-2{l  -  exp(-A2/2))l</A 

Jo 

S. 583. 

Partial  justification  for  (2.13)  comes  from  a  Taylor  series  expansion  of  (2.10)  to  obtain 
(2.15)  if(ti)  =  1-1*  E0/3(Sf+)/2  £*/}(Sr+)  +  •  • 

This  is  easily  turned  into  a  proof  of  (2.13)  with  an  error  o(/t).  That  the  error  is  actually 
o(/iJ)  and  that  p  has  the  value  given  in  (2.14)  are  more  difficult  to  prove.  See  Siegmund 
(1985,  Chapter  X)  for  details. 

To  complete  the  proof  of  (2.11),  we  must  justify  the  asymptotic  independence  of  (Sr  - 
^rj/r1/2  and  Sr  -  nr  ~  b  used  in  evaluating  the  limit  of  (2.9).  The  first  person  to  have 
noticed  this  relation  appears  to  have  been  Stam  (1968). 

Lemma  3.16.  Let  Sn,  n  =  1,2, •••  be  a  non-arithmetic  random  walk  with  drift  E(Si)  = 
/i  >  0  and  finite  variance  b 2  =  var(5j).  Let  f  =  f(o)  =  inf{n  :  S„  >  o).  As  a  — ►  oo,  for  all 
x  >  0,  —oo  <  y  <  oo 

P{Sf  -  a  <  x,  (r  -  oj»“,)/(aa2jr*),/2  <  y)  —  H(z)*{y) 


(2.13) 
where 

(2.14) 


and 

P{5r  -  a  <  x,  ( Sf  -  jif)/ r1/2  <  y)  -»  F(x)*(y), 
where  H  is  the  distribution  function  given  in  (2.8). 

Remark.  A  similar  result  holds  for  arithmetic  random  walks,  but  the  distribution  B 
is  slightly  different.  A  result  corresponding  to  the  first  relation  in  Lemma  2.16  holds  if 
A  =  0,  but  in  this  case  the  appropriately  normalised  f  is  f/a2,  and  $  must  be  replaced  by 
2*(jT »/*)  -  1. 
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Proof  of  Lemma  2.1A.  Since  by  (2.8) 

the  second  asymptotic  relation  follows  from  the  first  one.  To  prove  the  first  one,  let  n  = 
n(a,y)  =  ap~l  +  y(ad2ir1)1!2.  Then 

P{Sf  -  a  <  *,  r  >  n)  —  E[P{Sf  -  a  <  x  |  f  >  n,  5„};  f  >  nj. 

Suppose  <>i  <  a,  a  —  a\  — *  oo,  but  a  -  aj  =  ofa1^2).  Then  by  the  central  limit  theorem 

P(P{Sjr  —  a  <  *  |  f  >  n,  S}\ r  >  n,  at  <  S„  <  a]  <  P{ax  <  S„  <  a}  -»  0. 

Also,  uniformly  on  {r  >  n,  S„  <  aj},  by  (2.8) 

^{^?(«.)  -  a<  *\r  >  n,  Sn  =  z}  =  P{5?(,_,)  -  (a  -  z)  <  x}  ->  H[x). 

Hence 

P{Sf  -  a  <  x,  f  >  n}  =  H(x)P{f  >  n,  Sn  <  at}  +  o(l) 

=  H(x)  P{t  >  n)  +  o(l). 

The  lemma  follows  from  the  well-known  and  easily  proved  asymptotic  normality  of  f  with 
the  indicated  scaling. 

Using  (2.13),  one  can  rewrite  (2.11)  in  the  form 

(2.17)  Pj"^{r  <  m}  5!  exp{-2(6  +  p)(b  +  p  +  f]Tn  -  f)/m}- 

This  last  approximation  is  particularly  interesting  because  it  is  of  the  form  (2.5)  with  b 
replaced  by  6  +  p.  Moreover,  it  follows  from  (2.81  and  (2.13)-(2.15)  that  p  is  approximately 
the  expected  excess  of  the  discrete  random  walk  over  the  boundary,  so  (2.17)  has  the 
interpretation  that  to  correct  for  discrete  time  one  can  use  the  Brownian  motion  result 
(2.5)  with  boundaries  displaced  by  the  average  amount  the  discrete  time  process  jumps 
over  the  boundary  (cf.  Siegmund,  1984  for  other  results  having  a  similar  interpretation). 

The  approximation  (2.17)  is  also  valid  as  a  corrected  diffusion  approximation,  i.e.  if 
b  =  f to1/2,  rj  =  rjoTO-1/2,  and  (  =  (om1/2,  the  difference  between  the  two  sides  of  (2.17) 
is  o(m-1/2).  This  result  can  be  proved  along  the  lines  of  the  argument  sketched  above; 
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but  the  details  are  more  difficult  because  the  distribution  of  m~lr  does  not  become 
degenerate  as  m  — *  oo.  See  Siegmund  (1984). 

The  accuracy  of  (2.17)  is  quite  good.  For  m  =  8, 1 «  1.564,  and  rj  =  0  Worsley  (1983) 
has  numerically  calculated  pjm*{r  <  m}  to  be  .05.  The  approximation  (2.17)  yields  .0463. 
For  the  corresponding  comparison  when  m  =  5,  k  =  2.165  (m  =  10,  6  =  3.292),  (2.17)  gives 
.0488  (.0496).  It  is  perhaps  worth  observing  that  an  uncorrected  diffusion  approximation 
is  very  poor  for  small  m  -  ranging  from  .1  to  .2  for  these  examples. 

The  asymptotic  relation  (2.11)  can  be  generalised  to  a  large  class  of  random  walks 
whose  distribution  can  be  imbedded  in  an  exponential  family  (Siegmund,  1982).  The  ana* 
logue  of  v  given  in  (2.10)  can  be  computed  numerically  using  results  of  Woodroofe  (1979) 
or  by  an  approximation  along  the  lines  of  (2.13).  With  some  technical  improvements  the 
method  also  works  for  a  general  class  of  nonlinear  boundaries.  The  key  is  (2.7),  which 
suggests  that  if  the  boundary  is  to  be  crossed  at  all,  it  will  be  crossed  close  to  some  dis¬ 
tinguished  point.  This  further  suggests  that  one  try  to  approximate  the  boundary  by  its 
tangent  at  this  distinguished  point,  which  can  be  determined  as  the  point  through  which 
the  P^™*  line  of  drift  passes  when  (2  is  appropriately  chosen  for  the  linear  problem  of  the 
tangent  line.  Siegmund  (1982)  discusses  the  example  of  repeated  significance  tests  in  detail. 

2.4  Repeated  Significance  Tests 

For  repeated  significance  tests  in  exponential  families  a  slightly  modified  method  re¬ 
quires  considerably  less  algebraic  detail.  We  continue  to  consider  the  case  of  normally 
distributed  observations,  and  let  T  be  defined  by  (1.1). 

Theorem  2.18.  Suppose  k  — *  00,  m  -»  00  and  mo  -*  00  in  such  a  way  that  for  fixed 
0  <  Mi  <  <  00,  bm~1/7  =  mi,  =  Mo-  Let  0  <  |£0|  <  Ml  and  (  =  m(0.  Then 

P^m,{r  <  m}  <  (MoMr^expl-i"^**?  “  ($)]l  tor  M?/M 0  <  |(o|  <  Mi 

P(m){T  <  m}  ~  «'(M?/fo)Mifo‘  exP(~^m(Mi  -  £o)l> 
where  v  is  given  by  (2.12). 

Corollary  2.19.  Suppose  that  the  asymptotic  scaling  of  Theorem  2.18  holds  and  also 
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cm~lfi  *  1  €  (nlf no, in).  Then  for  p  ^  0 
(2.20) 

P,{T  <  m,  |S,|  <  cm1?*}  ~  „(M;.,-*)eIp|-mW(Jll  -  ,)], 


and 


(2.21)  Po{T  <  m,  1 5,*|  <  cm l^}  ~  26  tp(b)  f  z~lu{x)dz, 

Jx Jt* 

where  <p  denotes  the  standard  normal  density  function  and  v  is  given  by  (2. 12). -The  relation 
(2.21)  also  holds  when  e  =  6  (7  =  fii)\  (2-20)  and  (2.21)  hold  if  mo  =  o(m)  as  6  — ♦  00. 

Remark.  Corollary  2.19  suggests  that  one  approximate  (1.5)  by  using  (2.20)  or  (2.21)  as 
an  approximation  for  the  second  term  on  the  right  hand  side  of  (1.5).  This  is  in  fact  the 
approximation  used  to  compute  the  entries  in  Tables  1-3.  Strictly  speaking  (2.20)  is  not  a 
true  asymptotic  relation  when  c  =  b,  but  it  usually  gives  a  good  approximation  and  is  much 
easier  to  evaluate  than  the  asymptotically  ‘correct”  result.  See  Siegmund  (1985,  DC.3)  for 
a  more  complete  discussion  of  this  point. 

Corollary  2.19  follows  easily  from  the  theorem,  and  some  simple  estimates  which  are 
omitted  here  (cf.  Siegmund,  1985,  DC.3).  A  proof  of  Theorem  2.18  follows. 

Proof  of  Theorem  2.18.  First  observe  that  in  the  derivation  of  (2.5)  one  could  pretend 
that  time  Bows  from  m  to  0  instead  of  from  0  to  m  and  ‘reflect”  the  value  1^(0)  a*  0  to 
FF(0)  =  26  instead  of  reflecting  W(m)  =  (  to  FF(m)  =  2(6  +  nm)  —  Also  recall  that  in 
the  derivation  of  (2.11)  it  was  convenient  to  work  with  the  unconditional  probability  with 
the  same  drift  as  P^- 

Let 

pJ^A)  =  PM  |  50  =  A,  5*  =  ()  (A  €  Tm), 

and  put 

pf' (A)  =  r  PjJ(A)  -  0/m*?*} iX/m'l*. 

4  —OO 

Note  that  if  we  regard  the  process  as  running  backwards  from  an  ‘initial”  value  Sm  =  £, 
then  under  P^  it  is  normal  random  walk  with  zero  drift. 
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Let  T*  —  sup{r» :  n  <  m,  |5„|  >  6r*x/*},  so 

p,';>{r  <  m>  -  p,(j'<r  2  «,). 

It  is  tuj  to  see  thst  the  likelihood  ratio  of  s.+i ,  ■  ■  ■ , I»  under  f3]*'  relative  to  Pffl  is 
^"*)(m  -  n,  Sn  -  (;  X  -  -£),  where  tf^  is  given  by  (2.3).  This  simplifies  la 

exp[A  S„/n  -  Xi/2n  -  X(/m  +  A2/2m]. 

Hence  by  a  straightforward  integration  one  sees  that  the  likelihood  ratio  of  xn+l, ,xm 
under  relative  to  Pq^  is 

f  exp(A Sn/n  -  AJ/2n  -  A £/m  +  A2/2m]p[(A  -  {)/mlt2]dX/m1/2 

(2.22)  J- 00 

=  (n/m)1/2exp[S2/2n  “  (2/2m|. 

Since  T*  is  a  stopping  time  for  the  process  running  backwards  from  time  m  to  time  0, 
Wald’s  likelihood  ratio  identity  yields  the  representation 

r£j’(7  <">}  =  2  mo} 

(2.23)  «  Bt”l{(m/r )'/* «tp[-Sf./2r  +  (,/2m);  J"  >  mo) 

=  «p[-i(i’  -  (1/m)|E}”,{(m/r)‘/1oip[-i(S|./r  -  61)];  T‘  >  m„), 

where  denotes  expectation  with  respect  to  Pj,mK 

The  inequality  in  Theorem  2.18  is  an  immediate  consequence  of  (2.23).  To  prove  the 
asymptotic  relation,  it  remains  to  evaluate  asymptotically  the  expectation  in  (2.23). 

Observe  that  the  P^  joint  distribution  of  {T*,  Sr*)  is  the  same  as  the  P0  joint  dis¬ 
tribution  of  (m  -  r*,  (  +  5f»),  where 

r*  =  inf  {»v :  |£  +  5„|  >  6(m  -  n)1^}.  — 

Hence  the  expectation  on  the  right  hand  side  of  (2.23)  equals 

(2.24)  Po{(l  “  r7"»r1,2exp(-|(f  +  Sf)7/(m  -  r*)  -  62)];  r*  <m-  m0). 

An  easy  law  of  large  numbers  argument  shows  that  as  m  — *  00 

(2.25)  m~lr*  -»  1  -  (fo/Mi)2 
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in  probability,  and  in  particular 


I 


a 


a 


f  1  for  |{o|  > 

(2.26)  Po{r*  <  m  -  mo}  -*  < 

\0  for  |e0|  < 

If  we  were  dealing  with  Brownian  motion,  for  which  there  would  be  no  excess  over  the 
boundary,  this  would  complete  the  argument.  For  the  discrete  time  process,  after  using 
(2.2S)  and  (2.26)  in  (2.24),  it  suffices  to  show  that 

(2.27)  Jim^  Eo  jexp  J-  ^(f  +  5r.)*/(m  -  r*)  -  #*?"*]  j  =  v{/t f/(0), 

which  requires  a  renewal  theorem  (cf.  (2.8))  for  nonlinear  functions  of  a  random  walk. 

To  verify  (2.27)  observe  that  r*  can  be  expressed 

(2.28)  r*  =  inf  |n  :  n  >  1,  £j£Q’1n  +  Sn  +  -  fo)}  • 

If  the  term  involving  S*  did  not  appear  in  this  expression,  the  renewal  theorem  would  give 
us  the  limiting  distribution  of  the  excess  over  the  boundary.  Because  of  (2.25)  It  seems 
plausible  that  in  the  relatively  small  interval  of  time  into  which  r*  falls  with  probability 
close  to  one,  the  quadratic  term  m~lS„  is  effectively  constant  and  hence  has  no  effect  on  this 
limiting  distribution.  Lai  and  Siegmund  (1977)  describe  a  general  class  of  processes  which 
can  be  decomposed  into  the  sum  of  a  random  walk  and  a  term  which  varies  sufficiently  slowly 
that  the  limiting  distribution  of  excess  over  the  boundary  is  determined  by  therandom  walk 
alone  via  (2.8).  Lai  and  Siegmund’s  result  is  not  directly  applicable  here,  but  their  method 
is.  See  Appendix  2  for  an  informal  discussion  of  nonlinear  renewal  theory.  The  consequence 
is  that 

Ji^Po  “  ^£0  “  £o)  £  *}  *  B{x), 

where  H(x)  is  the  limiting  distribution  as  given  in  (2.8)  for  a  random  walk  S„  having 
normally  distributed  increments  with  mean  \Pi(o 1  “d  variance  1.  With  the  aid  of  (2.25) 
it  is  easy  to  convert  this  limiting  result  to 

lim  P0  |  ^(m  -  r*)“t(mf0  +  Sr»)2  -  S  *  J  -  H{x(9^2). 
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A  trivial  change  of  variable  yields  (2.27)  with  the  same  function  v  that  appears  in  (2.11) 
as  the  limit  of  (2.9). 

The  method  described  above  generalises  in  a  straightforward  fashion  to  repeated  sig¬ 
nificance  tests  in  one  parameter  exponential  families.  For  the  much  more  difficult  multipa¬ 
rameter  case,  see  Woodroofe  (1978)  and  Lalley  (1983).  Hu  (1985)  shows  that  the  present 
method  leads  to  simplifications  and  new  results  in  the  multiparameter  case,  especially  when 
there  is  some  invariance  present. 

In  the  case  of  Brownian  motion  the  preceding  argument  can  easily  be  sharpened  to  yield 
a  second  order  term  in  an  asymptotic  expansion  of  <  m}  or  Pp{T  <  m,  |W(m)|  < 

cm1/3}.  When  there  is  no  excess  over  the  boundary  the  only  approximation  involved  in 
the  preceding  argument  is  that  of  replacing  (1  -  r*/m)~1/3  by  its  limit  as  given  by  (2.25). 
To  obtain  the  next  order  of  approximation,  it  is  only  necessary  to  expand  (1  —  r* 
in  a  Taylor  series  and  analyze  its  central  limit  behavior.  Although  simple  in  principle,  the 
calculation  is  quite  complicated  in  detail  because  one  must  consider  three  cases:  (o  close  to 
the  endpoints  of  the  interval  (mi/poi  Mi)  and  &  in  the  interior  of  this  interval.  Siegmund 
(1985)  shows  under  the  conditions  of  Corollary  2.19,  for  T  defined  by  (1.1)  with  Brownian 
motion  W(t)  instead  of  5„,  for  m|/Mo  <  7  <  Ml 

P0{T  <  m,  |W(m)|  <  em1/3}  =  (5-  5-1)ip(6)log(me3/mo63) 

(2.29) 

+  *  V(*)(3  -  (6e_1)al  +  o(b~lp(b)). 

Miller  and  Siegmund  (1982)  discuss  the  history  of  the  special  case  e  =  6  of  (2.29), 
which  has  been  given  incorrectly  several  times  in  the  literature. 

Using  methods  introduced  by  Woodroofe  (1976b,  1982),  Woodroofe  and  Takahashi 
(1982)  obtain  the  comparable  approximation  for  Pq{T  <  m}  in  the  discrete  case.  The  result 
is  quite  complicated  and  does  not  appear  to  yield  generally  more  accurate  approximations 
than  the  one  suggested  here  (i.e.  the  sum  of  (2.18)  with  e  =  b  and  Po{|S«»|  >  5m1/3]  = 

2(1 -*W1). 


S.  Other  Boundary  Crossing  Problems 

In  Part  3  we  consider  a  number  of  somewhat  related  (fixed  sample)  statistical  prob- 
lems  which  involve  boundary  crossing  probabilities.  For  historical  reasons  the  Kolmogorov- 
Smiraov  and  Anderson- Darling  statistics  are  discussed  briefly  in  Section  3.1.  Section  3.2 
is  concerned  with  the  mathematically  similar  but  conceptually  different  problem  of  max¬ 
imum  x3  statistics.  Sections  3.3-3.6  on  change  point  problems  are  the  primary  focus  of 
the  chapter.  (These  sections  can  be  read  independently  of  the  first  two.)  As  we  shall 
see,  the  methods  of  Part  2  occasionally  deliver  an  appropriate  approximation  immediately, 
sometimes  additional  work  is  required,  and  sometimes  completely  new  methods  are  needtd. 

3.1  Kolmogorov* Smirnov  and  Anderson* Darling  Statistics 

Let  ui,  uj,  •  •  •  be  independent  and  uniform  on  [0, 1],  and  let 

>1 

Fn{z)  =  n~l  ^2  !{•><•) 

I 

be  the  empirical  distribution  function.  As  stated  in  the  introduction,  essentially  the  first 
boundary  crossing  problem  in  statistics  is  that  of  finding  the  distribution  of  the  one-sample 
Kolmogorov-Smimov  statistic, 

sup[x  -  F»(*)]. 

* 

The  distribution  can  be  evaluated  exactly  (e.g.  Bimbaum  and  Tingey,  1951),  but  the  result 
is  quite  complicated.  From  the  representation  of  the  uniform  order  statistics  as  Wfc/Wn+i* 

k  =  1, 2, "  - ,  n,  where  Wk  =  j/i  + - f  with  jfi,  yj,  *  •  ■  independent  standard  exponential, 

it  follows  that 

P{sup[*  -  J„(*)]  >  ?}  =  /’(  nu  \Wj  -  j\  >  n(  -  1 1  fVn+i  -  (n  +  1)  =  -1}. 

i 

The  methods  of  Part  2  yield  a  large  deviation  and  a  corrected  diffusion  approximation,  both 
of  which  are  very  accurate.  See  Siegmund  (1982),  Yuh  (1982),  and  Siegmund  (1984).  Of 
course,  the  limiting  distribution  is  given  by  (2.5)  with  (  =  0,  m  =  1,  and  b  =  fn1/2,  but  it 
is  not  a  particularly  good  approximation  for  small  n. 

Since  the  Kolmogorov-Smimov  statistic  is  insensitive  to  departures  in  the  tails  from 
the  hypothesized  distribution,  Anderson  and  Darling  (1952)  proposed  the  goodness  of  fit 


statistic  (two-sided  alternative) 


(3.1)  n1/J  sup  {|F„(z)  -  z|/[x(l  -  z))l/J}  (0  <  <  1  -  f2  <  1) 

and  observed  that  the  asymptotic  distribution  of  (3.1)  as  n  — »  oo  is  that  of  the  random 
variable 

(3.1)  ifMoi/wi  -  or'*. 

where  flro(t)>  0  <  t  <  1,  is  a  Brownian  bridge.  It  is  immediately  verified  by  checking  the 
covariance  function  that 

(3.3)  iy(t)  -  (1  +  t)*W(l  +  *)]  (0  <  t  <  oo) 

is  a  standard  (driftless)  Brownian  motion  process,  so  — 

(3.4)  P{  MM  |»'.(l)|/(«(l  -  t)f»  >i)  =  Pi  ,  mm  >  *}. 

Hence  the  asymptotic  significance  level  for  the  Anderson-Darling  statistic  equals  the  signifi¬ 
cance  level  of  a  repeated  significance  test  for  the  drift  of  Brownian  motion.  In  principle  one 
can  compute  (3.4)  exactly  (e.g.  DeLong,  1981),  but  since  the  answer  is  very  complicated 
and  is  only  a  crude  approximation  to  the  probability  of  interest,  a  good  and  easily  evaluated 
approximation  seems  preferable.  — 

Since  for  any  r  >  0,  r~1^2W(rt),  0  <  t  <  oo,  is  again  a  standard  Brownian  motion,  it 
follows  that 

p{  mm  r^FfOl  > ») 

•<f<9 

depends  only  on  the  ratio  vu-1,  not  the  actual  values  of  u  and  v.  Consequently  by  (3.4) 
and  (2.29)  with  e  =  6,  as  b  -*  oo 

P{|f?o(t)|  >  k(t(l  -  t)\1!7  for  some  ti  <  t  <  1  -  ej) 

(3.5) 

=  (6  -  b~l)<p(b)  log((l  -  ei)(l  -  t2)/eie2)  +  46_V(*)  +  c(6“M*))- 
Comparison  of  (3.5)  with  the  exact  numerical  computations  of  DeLong  (1981)  show  that  it 
is  quite  accurate,  even  when  the  probability  is  not  close  to  0. 


3.2  Maximum  x*  Statistics 

The  random  variable  (3.2)  arises  as  a  limit  in  distribution  in  a  context  which  at  first 
appears  to  be  quite  different  than  the  Anderson-Darling  statistic. 

Suppose  that  a  2  x  2  table  is  obtained  from  a  categorical  variable  A  or  A*  (not  A) 
and  a  dichotomized  quantitative  variable  Y,  which  divides  a  population  according  to  low 
(y  >  y)  and  high  ( Y  >  y)  values  of  Y.  See  Figure  4. 

Y<y  Y  >  y 

A  a  4  H  =  a  +  4  +  e  +  d 

A*  t  d 

Figure  4  ,» . 

This  situation  might  arise  if  A  (A*)  denotes  the  occurrence  (non-occurrence)  of  some  event 
or  presence  (absence)  of  some  disease  and  Y  is  a  diagnostic  predictor  of  the  event  or  disease. 
We  seek  a  cut  point  y*,  which  divides  the  population  into  low  risk  and  high  risk  groups. 

An  apparently  common  a d  hoc  procedure  for  choosing  y*  is  obtained  by  the  following 
reasoning.  For  a  given  value  of  y,  one  measure  of  dependence  between  the  categories  A  and 
y  <  y  is  the  x2  statistic 

*  N(od-hc)7 

*'  (o  +  6)(e  +  d)(o  +  c)(4  +  d)  ’ 

and  larger  values  of  x\  indicate  a  larger  degree  of  dependence.  Hence  we  choose  y*  to 
maximize  x\  (subject  to  keeping  some  minimal  percentage  of  the  total  sample  in  the  Y  <  y* 
and  Y  >  y*  categories).  To  assess  the  ‘significance*  of  this  predictor,  we  consider  the 
distribution  of  maxy  x\  under  the  assumption  of  independence  in  the  2x2  table  for  all  y. 

bet  #i(y)  =  P{Y  <  y  \  A},  Fj(y)  =  P{Y  <  y  \  A*}.  The  natural  nonparametric 
estimators  of  Fi  and  /j  are 

ff»(f)  =  «/(°  +  *)  ft(f )  =  e/{e  +  d). 

The  hypothesis  of  independence  in  the  2  x  2  table  for  all  y  is  ITo  :  Fi  =  F2,  and  F[y)  = 
(a  +  e)/N  estimates  the  common  distribution  function  under  Ho.  In  terms  of  Fi,  Ft,  and 
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F,  the  square  root  of  the  maximum  x3  statistic  is 

maxx,  =  max  \Ft(y)  -  A(tr)|/  (p(lf)ll  -  F{y)]  (-J-  +  -J-)  1  , 

9  v  \»i  *»*/ ) 

where  nj  =  a  +  b,  nj  =  e  +  d.  This  is  the  natural  definition  of  a  two-sample  Anderson- 

Darling  statistic,  which  under  Ho  converges  in  law  to  (3.2)  as  min(rti,n2)  — *  oo.  See  Miller 

and  Siegmund  (1982)  for  a  more  complete  discussion  and  numerical  examples. 

Although  the  probabilistic  aspect  of  this  problem  is  already  solved,  natural  and  simple 
generalisations  to  deal  with  more  than  one  predictor  variable  dichotomised,  say,  by  a  hy¬ 
perplane  seem  extremely  difficult.  See  Halpem  (1982)  for  a  more  precise  formulation  and 
Monte  Carlo  study. 

3.3  Introduction  to  Change  Point  Problems 

In  these  final  four  sections  we  shall  discuss  detection  and  estimation  of  the  time(s)  of 
an  abrupt  change  in  the  distribution  of  a  sequence  of  observations  zi,Z2,  •  •  •.  To  simplify 
the  discussion,  assume  that  the  z,-  are  independent  and  normally  disributed  with  means 
pM  and  variance  1.  Change  point  problems  appear  to  have  arisen  originally  in  the  context 
of  quality  control,  where  one  observes  the  output  of  a  production  process  sequentially  and 
wants  to  signal  any  departure  of  the  average  output,  from  some  known  target  value  po. 
Outstanding  contributions  in  a  long  line  of  papers  on  sequential  detection  are  Page  (1954), 
Shiryayev  (1963),  Lorden  (1970),  and  Poliak  (1985,  to  appear). 

In  the  following  we  consider  only  fixed  sample  problems  involving  a  finite  sequence 
*ii  *Si  •  •  • » The  specific  problems  to  be  discussed  are  to  test  the  null  hypothesis  of  no 
change  Ho  :  A**1*  =  ■  ■  ■  =  p^  against  the  alternatives  of  exactly  one  change, 

Hi  :  31  <  p  <  m  such  that  =  ■  •  •  =  p^  —  Po  Pi  =  *  •  •  •  =  p^m\ 

or  against  the  "epidemic”  or  "square  wave”  alternative, 

J5Ti  :  31  <  pi  <  ft  <  m  such  that  p^  =  ■  •  •  =  p^  =  po, 

^j(ri +i)  —  ...  —  ^(r>)  —  pq  ^  —  ...  as 

We  also  consider  estimation  of  p  by  a  confidence  set  when  the  hypothesis  of  exactly  one 


change  is  assumed  to  be  true.  Typically  and  S  =  hi  -  po  are  unknown,  but  it  sometimes 
seems  reasonable  to  suppose  that  a  particular  value  of  6  is  a  minimum  threshold  of  interest 
and  hence  to  regard  6  as  known  for  the  purpose  of  deriving  a  test  statistic. 

Examples  of  change  point  problems  in  epidemiology  are  described  by  Worsley  (1983) 
and  by  Levin  and  Kline  (1984).  Here  one  is  interested  in  testing  whether  the  incidence  of  a 
disease  has  remained  constant  over  time,  and  if  not,  in  estimating  the  time(s)  of  change(s) 
in  order  to  suggest  possible  causes.  Kendall  and  Kendall  (1980)  describe  an  interesting 
change  point  problem  in  archaeology,  and  Brown,  Durbin,  and  Evans  (1975)  give  a  number 
of  econometric  examples. 

In  Section  3.4  we  consider  the  likelihood  ratio  test  of  no  change  against  the  alternative 
of  exactly  one  change.  A  large  number  of  test  statistics  have  been  proposed  for  this  problem, 
and  there  is  no  attempt  to  compare  them  here.  The  main  conclusion  is  that  the  methods 
of  Part  2  provide  the  basic  tools  to  study  a  number  of  these  tests  without  resorting  to  the 
numerical  or  Monte  Carlo  efforts  that  have  been  the  basis  of  earlier  studies  (e.g.  Sen  and 
Srivastava,  1975). 

Section  3.5  is  concerned  with  finding  a  confidence  set  for  p.  In  the  case  where  /io  and  Hi 
are  both  known,  we  compare  confidence  intervals  based  on  the  maximum  likelihood  estima¬ 
tor  p  and  confidence  sets  (which  generally  are  not  intervals)  based  directly  on  the  likelihood 
function.  Hinkley  (1970,  1972)  has  mentioned  both  methods,  but  he  directs  his  efforts  pri¬ 
marily  at  computational  problems  and  does  not  compare  the  two  methods  quantitatively. 
To  minimize  the  computational  difficulties  and  facilitate  a  simple  comparison,  we  consider 
the  case  of  Brownian  motion.  The  likelihood  based  method  appears  to  be  preferable,  and 
it  is  extended  to  the  case  of  unknown  nuisance  parameters,  /*o  and  fii- 

Section  3.6  is  concerned  with  testing  the  hypothesis  of  no  change  against  an  epidemic 
alternative.  Here  one  encounters  processes  with  a  multidimensional  indexing  set,  which 
introduce  some  new  problems.  The  methods  of  Part  2  can  be  used  in  some  special  cases, 
but  in  others  an  adaptation  of  ideas  of  Bickel  and  Rosenblatt  (1973)  or  Qualls  and  Watanabe 
(1973)  seems  more  fruitful. 
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3.4  Testa  Against  the  Alternative  of  Exactly  One  Change 

The  problem  of  testing  the  null  hypothesis  of  no  change,  Ho  :  *=•••  =  =  Mo, 

against  the  alternative  of  exactly  one  change,  Hi  :  31  <  p  <  m  such  that  /*M  =  •  •  •  =  = 

po  ^  hi  =  s=  ■  •  •  jiM  and  mi  both  unknown)  has  been  widely  discussed;  and  a 

number  of  test  statistics  have  been  proposed.  The  quasi-Bayesian  statistics  of  Chemoff  and 
Zacks  (1964)  and  Gardner  (1969)  are  analytically  tractable,  but  maximum  likelihood  type 
statistics  have  typically  been  studied  by  numerical  or  Monte  Carlo  methods  (e.g.  Sen  and 
Srivastava,  1975,  Worsley,  1983). 

The  square  root  of  the  log  likelihood  ratio  statistic  is  proportional  to 

(3.6)  -  kSm/m\/[k(l  -  k/m)\lf2}. 

A  simple  heuristic  derivation  of  this  statistic  with  minimal  calculation  is  to  suppose  mo¬ 
mentarily  that  Hi  specifies  p  —  k.  The  problem  then  becomes  a  two  population  test  to 
decide  whether  the  mean  (po)  of  the  first  k  observations  equals  the  mean  (pi)  of  the  last 
m  -  k.  The  standard  test  statistic  is  the  normalised  difference  between  the  mean  of  the 
first  k  observations,  Sk/k,  and  the  overall  mean,  Sm/m.  This  is  just  (3.6)  without  the  max, 
which  accounts  for  the  fact  that  p  is  actually  unknown. 

Slightly  more  generally,  we  shall  consider 

(3.7)  max  {|S*  -  k  Sm/m\/[k(l  -  */m)]1/2}, 

where  1  <  mo  <  mi  <  m.  (A  justification  is  given  below.) 

To  obtain  some  intuition  for  the  virtues  and  defects  of  (3.7)  consider  also  the  ad  hoc 
suggestion  of  Pettit  (1980) 

(3.8)  i<Voi» 

Under  Ho  the  process  S*  -  k  Sm/m,  k  =  0, 1,  •  •  •  m  is  the  same  as  the  conditonal 
process  5*,  Jfe  =  0, 1,  •  •  • ,  m  given  that  Sm  =  0,  i.e.  the  same  as  a  Brownian  bridge  observed 
at  discrete  instants  of  time.  Hence  an  excellent  approximation  to  the  significance  level  of 

(3.8)  can  be  obtained  from  (2.11)  or  (2.16)  (multiplied  by  2  to  account  for  the  two-sided 
alternative);  the  significance  level  of  (3.7)  is  discussed  below. 


Under  Hi  the  drift  of  S*  -  k  Sm/m,  Jfe  =  0, 1,  •  •  • ,  m,  is 

k(l  -  p/m)(n  i  -  no),  k<p 

(3.9) 

{p/m)(m  -  k)(m  -  no),  k>p , 

and  the  residual  process  after  subtracting  out  the  drift  is  again  a  Brownian  bridge  observed 
at  discrete  instants  of  time. 

It  seems  intuitively  clear  from  (3.9)  as  illustrated  in  Figure  5  that  (3.8)  is  more  powerful 
than  (3.6)  for  detecting  changes  that  occur  near  m/2,  whereas  the  converse  is  true  for 
changes  occurring  near  the  endpoints  0  and  m. 


Figure  6 


It  is  intrinsically  difficult  to  detect  a  change  that  occurs  near  one  or  the  other  endpoint, 
and  the  likelihood  ratio  statistic  pays  for  its  efforts  to  do  so  by  giving  up  power  near  p  —  m/2. 
The  introduction  of  mo  and  in  (3.7)  gives  the  statistician  the  flexibility  to  give  up  some 
power  to  detect  changes  occurring  near  the  endpoints  in  return  for  an  increase  in  power 
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By  conditioning  on  one  can  obtain  approximations  to  the  power  of  (3.7)  and  (3.8), 
which  can  be  used  to  compare  these  statistics  with  each  other  and  with  other  proposals  (e.g. 
the  recursive  residual  test  of  Brown,  Durbin,  and  Evans,  1975).  A  more  complete  discussion 
will  appear  in  a  future  publication  of  B.  and  K.  James.  To  illustrate  the  applicability  of 
the  methods  of  Part  2,  and  to  prepare  for  the  discussion  of  confidence  sets  in  Section  3.5, 
an  approximation  to  the  significance  level  of  (3.7)  is  given  below. 

Let  Xi,X2,  ■  ■  • , xm  be  independent  standard  normal  random  variables,  and  put  Sn  = 
Xi  H - f  x„.  We  continue  to  use  the  notation 

P^(A)  =  P(A  |  Sn  =  0,  A  €  /(*i,  •  •  • ,  *„). 

Let  b  >  0,  m  —  2, 3,  •  •  •,  1  <  m©  <  m,  and  define 

(3.9)  T  =  inf{n  :  n  >  mo,  |5,,|  >  6[n(l  —  n/m)]1^2}. 


Let  m0  <  mi  <  m  —  1.  The  significance  level  of  the  test  defined  by  (3.7)  is 
P}m){T  <  m,}  - 


(3.10) 


1 


Wl/s*)]*/* 


<  tni}pQ*^{Smi  €  d£}. 


Theorem  3.11.  Assume  that  6  -♦  oo,  mo  -*  oo,  mi  — *  oo,  m  -»  oo  in  such  a  way  that  for 
some  0  <  to  <  ti  <  1  and  pi  >  0 


mi/m  —*  ti  (*  =  0, 1)  and  6/m1/2  =  /*i. 


Let  f  =  mf0  for  some  |{0|  €  (mi(1  -  *i)(*o/(l  “  «o)ll/2,  /*x [*i(l  “  *i)lI/2]). 

Then  as  m  — »  oo,  P^m,*{T  <  mi}  ~ 

(ti(l  -  *i)]1/2Mi{o'Mp?(1  “  h)/(o  +  fo/(l  -  ti)]  exp{-im(M2  -  fo/M*  “  <»))}.  "here  v  is 
given  by  (2.12). 

Remarks.  Substitution  of  this  asymptotic  expression  into  (3.10)  suggests  the  approxima¬ 
tion 

.  ji/» 

(3.12)  Pim^{T  <  mi)  S  26  <p(b)  /  z~lv(x  +  bi/mx)dx  +  2(1  -  *(6)j, 


which  can  be  shown  to  be  a  valid  asymptotic  relation  (even  if  mo  and  m  -  mi  are  o(m)).  A 
proof  of  Theorem  3.11  along  the  lines  of  Theorem  2.18  has  an  interesting  twist,  leading  to 
some  new  technical  problems.  An  informal  discussion  is  contained  in  Appendix  1.  Siegmund 
(1985,  Chapter  XI)  derives  (3.12)  directly  without  first  obtaining  Theorem  3.11.  However, 
we  shall  find  Theorem  3.11  to  be  of  interest  in  its  own  right  in  the  next  section. 

Table  5  gives  an  indication  of  the  accuracy  of  (3.12).  For  comparison  an  exact  numerical 
calculation  from  Worsley  (1983)  or  the  result  of  a  Monte  Carlo  experiment  plus  or  minus  one 
standard  error  is  also  given.  There  were  2500  repetitions  of  the  Monte  Carlo  experiment, 
and  importance  sampling  along  the  lines  discussed  in  Siegmund  (1975)  was  used  for  variance 
reduction. 

Table  5 


p„w{r  *  «i> 


b 

m0 

mi 

m 

Probability 

Approximation  (3.12) 

Exact  or 

Monte  Carlo 

2.91 

1 

3 

4 

.010 

.01 

2.65 

1 

9 

10 

.052 

.05 

2.38 

1 

9 

10 

.105 

.10 

2.38 

4 

8 

10 

.057 

.058  ±  .002 

2.5 

1 

19 

20 

.120 

.110  ±  .002 

2.5 

1 

10 

20 

.066 

.063  ±  .001 

2.5 

4 

16 

20 

.073 

.074  ±  .002 

Exact  values  from  Worsley  (1983) 

3.S  Confidence  Seta  for  p 

This  section  is  concerned  with  finding  confidence  sets  for  p,  when  fio  and  pi  are  re* 
garded  as  nuisance  parameters.  Initially  we  shall  asume  that  and  pi  are  both  known, 
a  case  studied  in  considerable  detail  by  Hinkley  (1970,  1972),  who  suggested  a  method 
based  on  the  maximum  likelihood  estimator  p  and  a  second  method  based  directly  on  the 
likelihood  function.  In  order  to  simplify  the  computational  difficulties  as  much  as  possible 
and  obtain  a  picture  of  the  relative  merits  of  these  two  proposals,  we  begin  with  the  case  of 


Brownian  motion  observed  for  0  <  t  <  m.  As  the  results  of  Part  2  indicate,  use  of  Brownian 
motion  as  an  approximation  usually  yields  quantitatively  poor  results  for  boundary  cross* 
ing  probabilities.  For  comparing  competing  procedures,  however,  Brownian  motion  can  be 
quite  useful. 

Hence  let  W(t),  0  <  t  <  m,  be  standard  Brownian  motion,  and  assume  that  the 
observed  process  X(t),  0  <  t  <  m,  satisfies 

dX(t)  =  no  dt  +  dW(t)  for  0  <  t  <  p 
=  Hi  dt  +  dW(t)  for  p  <t  <  m, 

where  Ho  and  /*i  are  both  known  and  p  is  unknown.  There  is  no  loss  of  generality  in  taking 
po  =  0.  Put  6  =  Hi-  The  likelihood  function  at  p  =  t  is  proportional  to  exp  [§£2t  —  SX(t)]. 
Hence  the  log  likelihood  l(t)  =  \67t  -  6X[t)  satisfies 

dt(t)  =  ~  *  dW(t),  0  <t<p 

=  -U2dt-6  dW{t),  p  <  t  <  m, 
z 

i.e.  t[t)  is  Brownian  motion  with  drift  |£2  or  -§ S2  according  t  is  <  p  or  >  p,  and  p  is  the 
time  at  which  this  process  takes  on  its  maximum  value. 

It  is  easy  to  compute  the  distribution  of  p,  but  to  simplify  the  resulting  expression  we 
assume  that  p  and  m  —  p  are  effectively  infinitely  large.  Consider 

p,{p  -  p  €  (t,  t  +  dt),  i(p)  -  Up)  e  (*,  *  +  </*)}. 

This  joint  density  can  be  evaluated  by  (i)  conditioning  on  t(t)  —  Up)  =  x  —  y(dt)1^2,  t(t  + 
dt)  —  t(p)  =  x  —  x(dt)ll 2,  (ii)  computing  the  (conditional)  probability  that  the  process 
t(s)  -  t(p)  does  not  attain  the  value  x  for  »  <  t  nor  for  «  >  t  +  dt  and  its  maximum  in  the 
interval  (t,  t  +  dt)  is  in  (z,  x  +  dx),  and  (iii)  integrating  out  y  and  z  over  (0,oo).  See  Figure 
6.  The  joint  density  is  $"1z|t|”s/2(l  -  exp(-z)jp  [x/6 1*!1/*  +  |$|f|1/2)  dx  dt  for  z  >  0  and 
0  otherwise.  Integration  over  z  €  (0,  oo)  and  t  €  (r,  oo)  (r  >  0)  yields 

P,{P  -P>r}  =  9  (-^fl/a)  +  |)  *  *fl/V  (^fl/3) 

-  exp(f2r)#(-3£r^2/2). 


(3.13) 


It  follows  from  (3.13)  that  the  length  of  a  95%  confidence  interval  for  p  obtained  by  treating 
p  -  p  as  a  pivotal  quantity  is  about  22/S7.  (Without  the  assumption  that  p  and  m  -  p  are 
infinitely  large,  p  —  p  would  not  be  an  exact  pivotal). 

The  definition  of  a  likelihood  based  confidence  set  is  very  simple.  For  z  >  0  let 
A(p )  =  {sup,[£(*)  -  t[p)\  <  *}.  Choose  x  so  that  P,{A(p)}  —  1  -  a  and  define  the 
confidence  set  to  be  those  values  t  for  which  £(t)  >  Up)  -  z.  Again  assuming-that  p  and 
m  -  p  are  effectively  infinite,  one  easily  sees  that  (cf.  Figure  6) 

P'iMp)}  =  l1  “ 

so 

X  =  -  logfl  -  (l  -  o)1^*]. 

Observe  that  the  confidence  set  obtained  in  this  manner  is  by  no  means  an  interval. 
In  fact,  because  of  the  rapid  fluctuations  of  Brownian  sample  paths,  with  probability  one  it 
consists  of  the  union  of  infinitely  many  open  intervals. 


To  compare  this  likelihood  based  confidence  set  with  the  confidence  interval  determined 
above,  we  compute  the  expected  size  of  the  confidence  set,  i.e. 

(3.14)  £,(*{<  : «  e  A(t)}]  =  [°°  P,Wt)]dt, 

J  —00 

where  A  denotes  Lebesgue  measure  and  u  is  a  sample  path  l(«)  —  t(p),  -oo  <  t  <  oo.  By 
conditioning  on  l{t)  one  can  derive  an  expression  for  P,[A(t)],  which  when  integrated  shows 
that  (3.14)  equals 

4*-1(l  -  exp (-x)]{x6~l  -  (2tf)"1[l  -  exp(-*))} 

=  4J-*(1  -  log(l  -  (1  -  a)1/*]  -  (1  -  a)1/2}. 

For  a  95%  confidence  set  the  expected  size  is  about  10.5/ 6*,  or  less  than  one  half  the  length 
of  the  corresponding  confidence  interval  based  on  the  distribution  of  p.  Numerical  evidence 
indicates  that  the  likelihood  based  set  has  approximately  the  same  expected  size  advantge 
throughout  the  range  of  commonly  used  confidence  levels. 

Remark.  Cobb  (1978)  has  proposed  yet  a  third  confidence  set  (interval)  for  p,  which  is 
in  a  certain  respect  intermediate  between  the  two  proposed  here.  Given  a  suitable  t0  >  0, 
Cobb  treats  t(p)  -  l[t),  |t  -  p|  <  to,  as  ancillary  and  bases  his  interval  on  the  conditional 
distribution  of  p  -  p  given  this  ancillary  statistic.  Thus,  if  the  likelihood  function  drops 
off  sharply  from  its  maximum  at  p,  Cobb’s  interval  is  short  -  a  property  shared  by  the 
likelihood  based  confidence  set.  There  is  some  arbitrariness  in  the  choice  of  to,  which  seems 
a  definite  disadvantage  if  one  tries  to  adapt  this  method  to  the  case  of  unknown  Ho  and  Hi, 
especially  if  m  is  of  moderate  size.  Nonetheless,  it  would  be  interesting  to  compare  Cobb’s 
method  with  those  described  above. 

Now  suppose  that  our  observations  are  X\,  •  •  • ,  xm  as  in  Section  3.4,  that  the  hypothesis 
of  exactly  one  change  is  true,  and  that  no  and  Hi  are  unknown  nuisance  parameters.  A 
likelihood  based  confidence  set  for  p  can  be  defined  as  follows.  The  log  likelihood  ratio 
statistic  for  testing  the  hypothesis  p  =  po  against  the  alternative  of  arbitrary  p  is  (cf.  (3.6)) 


max[(Si  -  k  Sm/m)2/k{l  -  k/m) ]  -  (5*,  -  po  5„/m)2/po(l  -  po/m). 


Hence  for  1  <  mo  <  mi  <  m  and  e  >  0  define  the  events 


(3.15)  A(p,e)  =  {  max  [{Sk-k  Sm/rn)2/k(l-k/rn)]-(S,-p  Sm/rn)2/p(l-p/rn)  <  c2}. 

fnoS*<i»i 

Although  the  unconditional  probability  of  A(p,  e)  depends  on  both  p  and  6  =  Hi  -/to,  its  con¬ 
ditional  probability  given  that  Sf-  p  Sm/m  =  £,  say,  does  not  depend  on  6.  (Conditionally, 
Sk-k  Sm/m  -  (k/p,  k  m  0, 1,  and  5*  -  k  Sm/m  -  ((m  -  k)/(mp),  k  =  p,  p  + 1,  •  •  • ,  m 
are  two  stochastically  independent  Brownian  bridges  in  discrete  time.)  Hence  one  can  in 
principle  determine  e  —  e(a,  p,  £)  such  that 

(3.16)  P,{A(*  <01  S,-  pSm/m  m  £ }  =  1  -  a. 

From  (3.16)  it  follows  immediately  that  the  set  of  all  p  such  that  the  sample  path  u  = 
{5*  -  k  Sm/m,  k  =  0, 1,  •  •  • ,  m}  belongs  to  _ 

A[p,  e(atp,S,-  p  Sm/rn) ) 


is  a  (1  -  a)100%  confidence  set  for  p. 

To  implement  this  procedure  one  must  compute  the  conditional  probability  in  (3.16); 
but  this  problem  is  already  solved  (asymptotically)  in  Theorem  3.11,  as  follows. 

In  terms  of  the  stopping  time  T  defined  in  (3.9) 

P,{A(py  e)  |  S,  -  pSm/m  =  £}  =  P^{T  <  p)  +  Pim~f){T  <m-p} 
(3.17)  * 

- [P(W{T  <  p}P^]{T  <m-  p}], 

where  b  —  [e2  +  i2/p(\  —  p /m] l!2 .  If  in  addition  to  the  asymptotic  normalisation  of  Theorem 
3.11  one  assumes  that  c 2  is  proportional  to  m  and  p/m  equals  some  constant  in  (0, 1),  then 
Theorem  3.11  and  (3.17)  yield 


(3.18) 


P,{A(p,c)  |  S,-p  Sm/m  =  £}  ~  exp  [1  +  ~  p/™)c2/t2]l/2 

■  {v[e2(l  -  p/m)/£  +  i/p(  1  -  p/m)]  +  v[e2p/m(  +  £//>(  1  -  *>/m)]}, 
where  v  is  defined  in  (2.12)  and  evaluated  approximately  in  (2.13)-(2.14). 


Table  6  indicates  the  accuracy  of  (3.18).  To  obtain  a  Monte  Carlo  estimate  of  the 
desired  probability,  importance  sampling  (Siegmund,  1975)  was  used  to  obtain  independent 


estimates  of  the  two  probabilities  on  the  right  hand  side  of  (3.17).  The  standard  error  of 
the  overall  estimate  was  obtained  via  the  obvious  Taylor  series  expansion.  The  number  of 
repetitions  in  each  Monte  Carlo  experiment  was  900. 


Table  6 

P,{A(p,c)  |  Sp  -  p  Sm/m  =  {} 


m 

e 

P 

6)  =  trn~l 

Probability 

Approximation  (3.18) 

Monte  Carlo 

Estimate 

40 

2.5 

20 

.50 

.027 

.029  ±  .001 

40 

2.4 

12 

.25 

.059 

.058  ±  .001 

20 

2.4 

10 

.25 

.066 

.059  ±  .001 

20 

2.4 

6 

.25 

.057 

.052  ±  .00 1_ 

20 

2.2 

6 

.50 

.043 

.042  ±  .002 

3.0.  Teats  Against  the  Epidemic  Alternative 

In  this  section  we  consider  several  tests  of  the  hypothesis  of  no  change  H0  :  pW  = 
pW  =  •  •  •  =  p(m)  =  p0,  against  the  epidemic  or  square  wave  alternative,  Hx  :  3  1  <  pi  < 
p2  <  m  such  that  pW  =  . . .  —  j*(ei)  —  —  ...  —  ^  ^(w+i)  =  . . .  = 

pim)  =  pq.  Results  for  this  problem  are  incomplete,  and  our  goals  will  be  (i)  to  show  that 
these  tests  naturally  involve  new  boundary  crossing  problems  and  to  (ii)  suggest  possible 
approaches  to  their  solution.  The  problems  are  different  than  those  discussed  earlier  in 
this  paper,  and  the  methods  of  Part  2  seem  of  limited  usefulness.  A  promising  alternative 
approach  is  provided  by  the  method  of  Pickands  (1969)  as  developed  independently  by  Bickel 
and  Rosenblatt  (1973)  and  Qualls  and  Watanabe  (1973).  The  results  presented  here  are 
joint  work  with  M.  Hogan,  which  will  be  described  in  greater  detail  in  a  future  publication. 

Typically  po  and  6  are  unknown  nuisance  parameters,  although  often  only  one-sided 
alternatives  with  6  >  0,  say,  are  of  interest.  We  shall  assume  that  there  is  some  threshold 
change,  So,  which  one  is  interested  in  detecting  and  consider  tests  for  the  particular  alter* 
native  6  =  S0.  Thus  in  effect  we  assume  that  6  is  known  for  the  purpose  of  deriving  a  test 
statistic,  although  a  complete  evaluation  of  that  statistic  would  involve  all  values  of  S,  not 


just  the  hypothesized  value  Jo¬ 


in  contrast  to  the  alternative  of  exactly  one  change,  the  epidemic  alternative  has  rarely 
been  considered.  See  Levin  and  Kline  (1984)  and  Bhattacharya  and  Brockwell  (1976)  for 
two  quite  different  discussions. 


Assume  that  S  =  So  is  known.  In  the  unlikely  case  that  no  is  also  known  the  log 
likelihood  ratio  statistic  for  testing  Ho  against  Hi  is  proportional  to 


(3.19) 


1 

Z\  =  max  Sj  -jn o  -  (5,-  -  »>o)  ~  Mi ~ *) 

0<x<j<m  2 

=  -i  no- i  So/2  -  “  *7*o  -  *So/2)  . 


For  the  case  of  unknown  no  Levin  and  Kline  (1984)  suggest  the  use  of  (3.19)  with  no  replaced 
by  its  maximum  likelihood  estimate  under  Ho,  namely  no  —  m~lSm,  to  obtain 


(3.20)  Z2  =  max  (5,-  -  /  Sm/m  -  (5<  -  iSm/m)  -  (/  -  t)<o/2]. 

0<«<j<m 

The  actual  log  likelihood  ratio  statistic  in  the  case  of  known  S  =  S0  and  unknown  no  i* 
easily  calculated  to  be 

(3.21)  Zs  =  0<m«  m  | Si  -  Si  -  (i  -  i)Sm/m  -  ^So{i  -  *)(1  -  (j  -  *)/m] | . 

Levin  and  Kline  (1984)  discuss  Bernoulli  and  Poisson  data;  and  in  that  context  an 
important  aspect  of  their  test  is  their  proposal  to  use  a  conditional  distribution  given  Sm, 
which  under  Ho  does  not  depend  on  the  unknown  no,  to  compute  a  significance  level.  In 
the  Gaussian  case  under  discussion  here  the  conditional  and  unconditional  distributions  are 
the  same.  Since  (3.20)  is  somewhat  easier  to  study  than  (3.21),  an  interesting  question  is 
to  what  extent  the  two  statistics  behave  similarly.  Presumably  they  do  if  the  duration  of 
the  epidemic,  po  -  Pi,  is  small  compared  to  m,  but  not  in  general. 

For  a  completely  different  problem  which  leads  to  consideration  of  (3.20)  in  the  special 
case  So  =  0  and  the  simpler  framework  of  continuous  time,  see  Adler  and  Brown  (1984). 
For  a  different  underlying  random  walk  the  probability  that  Z\  in  (3.19)  is  greater  than  b 
can  be  interpreted  as  the  probability  that  among  the  first  m  customers  of  a  G/GI/l  queue, 
at  least  one  has  a  waiting  time  exceeding  b. 


The  appearance  of  a  two-dimensional  indexing  set  in  (3.19)-(3.21),  corresponding  to 
the  unknown  onset  and  disappearance  of  the  epidemic,  makes  the  null  hypothesis  sampling 
distributions  of  these  statistics  quite  different  from  those  discussed  earlier.  Approxima¬ 
tions  to  the  power  function  seem  more  complicated  in  detail,  but  for  the  most  interesting 
range  of  parameter  values  do  not  seem  to  require  fundamentally  new  ideas.  The  remainder 
of  this  section  describes  some  promising  methods  for  approximating  the  null  hypothesis 
distributions  of  (3.19)-(3.21). 

We  begin  with  the  relatively  simple  (3.19),  which  gives  us  an  idea  of  what  we  can  hope 
to  achieve  in  the  more  complicated  (3.20)  and  (3.21). 

bet  j/i,  ]f2,  ■  be  independent,  identically  distributed  random  variables  with  P(y,)  <  0. 
Let  Sn  —  Vi  + - f  Vn  and  for  6  >  0  define 

(3.22)  r  =  r(b)  =  inf{n  :  S„  -  ^min  5*  >  6}. 

The  following  inequality  is  useful  in  analyzing  (3.19). 

Proposition  3.23.  Let  r  =  r(ft)  be  defined  by  (3.22),  r+  =  inf{n  :  S„  >  0},  and  T  = 
inf{n  :  Sn  &  (0,6|).  Then 

P{r(b)  £  »»}  <  P{r+  =  oo}P{(m  -  T  +  1);  T  <  m,  ST>b} 

(3  24l  "*~l 

+  ^  P{n  <  r+  <  oo}P{T  <  m  -  n,  St  >  4}- 

Moreover,  a  lower  bound  for  P{r(6)  <  m}  is  the  right  hand  side  of  (3.24)  divided  by 
1  +  E{(m  —  T  +  1);  T  <  w,  Sy  >  6}. 

Remark  3.25.  With  large  deviation  scaling  and  observations  whose  distribution  can  be 
imbedded  in  an  exponential  family,  one  can  use  likelihood  ratio  identities  similar  to,  but 
simpler  than,  those  developed  in  Part  2  to  obtain  first  or  second  order  asymptotic  approx¬ 
imations  to  P{r(6)  <  m}.  For  example,  for  the  normal  case  in  (3.19),  if  mexp(— $o6)  0 

and  60m/2b  -  1  is  bounded  below  by  some  positive  number,  then 

(3.26)  P?o{Zi  £  4}  ~  Sn(mSo/2  -  b)if9(So)exp(-Sob), 

where  v  is  given  by  (2.12).  If  in  fact  m3exp(-M)  — *  0,  then  the  error  in  (3.26)  is 
JT(£o)exp(-M)(l  +  o(l)),  where  K  is  very  complicated  to  evaluate  exactly,  but  satis- 


fies  K(S)  ~  2i>  £S/S  as  6  -*  0  (and  better  approximations  are  possible).  Details  of  this 
asymptotic  analysis  will  be  presented  elsewhere. 

Proof  of  Proposition  3.23.  Let  denote  the  n-shifted  sample  path,  so  £*((•>„)  — 
5n+*(w)  -  5n(a/).  The  event  {r  <  m}  can  be  decomposed  into  a  union  of  disjoint  events, 

m— 1 

{r  <  m}  =  |J  {r  >  n,  Sn  =  ^min^S,,,  T(u£)  <  m  -  n,  Sr(w^)  >  6}.  Hence  by 

n=0  ~  ~ 

independence, 

m— 1 

P{r  <  m}  =  P{r  >  n,  S„  =  min  S*}P{T  <  m  -  n,  Sy  >  6} 

n=0 
m— 1 

>  "  $k  <  0  V  A  <  n}  -  P{r  <  n}]P{T  <  m  -  n,  5y  >  6} 

n=0 
m— 1 

>  £|f<r+  >  n}  -  P{,  <  m}jP{T  <  m  -  n,  St  >  6} 

n=0 

=  [P{r+  =  oo }  -  P{r  <  m}]J£{(m  -  T  +  1);  T  <  m,  5y  >  6} 

m— 1 

+  ^  P{n  <  r+  <  oo}P{T  <  m  -  n,  Sy  >  6}. 

r»=0 

Rearranging  gives  the  lower  bound,  and  a  similar,  simpler  argument  gives  the  upper  bound. 

In  principle  the  method  sketched  in  Proposition  3.23  and  Remark  3.25  to  approxiamte 
the  distribution  of  (3.19)  should  also  be  applicable  to  (3.20).  In  this  case  because  of  non- 
stationarity  one  must  decompose  {Zt  >  6}  not  only  according  to  the  location  of  a  (relative) 
minimum  of  the  sample  path  but  also  according  to  the  value  of  the  process  at  the  minimum. 
The  details  become  much  more  complicated  and  are  not  pursued  here.  For  the  simpler  case 
of  Brownian  motion  it  is  straightforward  to  obtain  what  one  expects  to  be  very  goo.  .pper 
bounds.  Analyzing  these  asymptotically  leads  to  the  following  conjecture. 

Conjecture.  Suppose  m  — ♦  oo,  b  — ♦  oo  such  that  for  some  fixed  0  <  5  <  00  and  -00  < 
(0  <  ?, 

b/m  —  f  and  (/m  =  (o- 

Then 

(3.27) 

P<m){  max  (IF(t)  -  IF(s)j  >b}  =  (2m-l(26  -  fl(6  -  ()  +  1  +  o(l)]exp(-2m-l6(6  -  £)]. 


One  can  give  a  rigorous  proof  of  the  leading  term  in  (3.27)  (cf.  Theorem  3.28  below), 
but  the  second  term  causes  some  difficulty.  Since  a  standard  reflection  argument  yields  an 
exact  evaluation  of  the  two-sided  probability, 

{.<?<■?< 

it  is  surprising  that  the  onesided  problem  should  appear  to  be  considerably  more  difficult. 

An  alternative  method  for  approximating  the  null  hypothesis  distributions  of  (3.19) 
and  (3.20),  which  works  equally  well  for  (3.21),  is  that  developed  independently  by  Bickel 
and  Rosenblatt  (1973)  and  Qualls  and  Watanabe  (1973).  (Both  of  these  papers  generalise 
to  multidimensional  time  parameters  the  method  of  Pickands  (1969)  for  a  linear  time  pa¬ 
rameter.)  Since  the  general  results  of  these  authors  give  a  tail  probability  for  the  maximum 
of  a  Gaussian  field  in  terms  of  an  integral  involving  another  complicated  probability,  it  is 
not  immediately  evident  that  the  computational  problem  has  been  essentially  simplified. 
But  for  Gaussian  fields  built  up  from  random  walks  (or  Brownian  motion)  in  sufficiently 
simple  ways  one  can  use  renewal  theory  to  evalute  the  required  intergrals  in  terms  of  the 
function  v  of  (2.12).  For  illustration  the  tail  behavior  under  H0  of  (3.20)  and  (3.21)  is  given 
below. 

Let  *i,  Xj,  •  •  •  be  independent  standard  normal  random  variables,  and  put  Sn  =  *i  + 
- h  x„.  Suppose  b  and  m  -*  oo  in  such  a  way  that  m~lb  —  f  is  a  fixed  positive  constant. 

Theorem  3.28.  For  ft  > 

**{  max  [Sj  -  Si  -  m_1(;  -  i)Sm  -  {j  -  »')ft)  >  &}  _ 

~  */2(2(2f  +  ft)](2m(2f  +  ft)(f  +  ft))exp[-2mj(?  +  ft)], 
where  v  is  given  by  (2.12). 

Theorem  3.29.  Let  ft  >  0.  Then  for  ft  >  4f 

P{o<maxjS,-  -  Si  -  m-'V  -  i)Sm  -  ft(;  -  t')(l  -  m~l{j  -  *))]  >  6} 

~  *'2(2ft)m^2  J  ~  f/ftj  exp(— 2mftf), 

while  for  0  <  ft  <  it  is  — 

~  «/a(ft  +  4?)4m(f  +  ft/4)#/J(*  -  ft/4)_I/J  exp(-2m(?  +  ft/4)2), 

46 


where  v  is  given  by  (2.12). 


Note  that  in  (3.26),  Theorem  3.28,  and  Theorem  3.29  the  function  u{)  which  accounts 
for  excess  over  the  boundary  is  squared,  basically  because  of  the  two  dimensional  time 
parameter.  Since  typical  values  for  !/(•)  are  in  the  range  .5  to  .7,  for  these  problems  use  of  a 
simple  Brownian  motion  approximation,  which  replaces  i/(-)  by  1,  probably  gives  extremely 
poor  results. 

The  preceding  discussion  is  only  a  beginning  attempt  to  study  the  problem  of  change- 
points  with  epidemic  alternative.  It  is  included  here  to  show  how  quickly  natural  generaliza¬ 
tions  of  previous  work  lead  into  new  territory,  requiring  new  ideas.  Two  obvious  questions 
are  (i)  how  good  are  these  approximations  and  (ii)  what  do  they  (presumably  in  conjunc¬ 
tion  with  approximations  for  the  power  function)  tell  us  about  the  relative  merits  of  (3.20) 
and  (3.21)?  Preliminary  Monte  Carlo  experiments  indicate  that  the  approximations  are  not 
nearly  so  accurate  for  small  m  as  those  given  in  Part  2,  although  the  derivations  of  (3.26) 
and  (3.27)  might  lead  one  to  expect  quite  good  approximations. 
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Appendix  1 
Proof  of  Theorem  3.11 


f 

s 
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Li 


Both  Theorems  2.18  and  3.11  can  be  proved  by  the  method  of  Siegmund  (1982),  which 
requires  rather  length  analytic  calculations.  The  somewhat  different  method  used  in  this 
paper  to  prove  Theorem  2.18  yields  a  considerably  simpler  proof  of  that  result,  so  one 
naturally  asks  how  well  it  adapts  to  related  problems.  As  we  shall  see  below,  it  gives  the 
appearance  of  a  relatively  computation  free  proof  of  Theorem  3.11.  However,  there  are 
some  technical  problems  which  seem  to  demand  additional  analytic  computation  for  their 
complete  solution. 

Let  T  be  defined  by  (3.9)  and  assume  the  conditions  of  Theorem  3.11.  We  also  use  the 
notation^ from  the  proof  of  Theorem  2.18.  Let 

{A.  1)  p!~>  -  r  ifi'VWl  -  -  f/(l  -  l,)|>[(I  -  «,)/”*(, ]*/*«. 

J  — oo 

Also  let  J*  -  sup{n  :  n  <  mlf  \S*\  >  6(n(l  -  n/m)]1'2},  so  P^t]{T  <  »m}  =  P^l){T*  > 
mo}.  As  in  the  proof  of  Theorem  2.18  one  easily  calculates  the  likelihood  ratio  of  zn,  •  •  • ,  zm, 
under  relative  to  Pj~'*  and  obtains 

Ml  -  U)/{m  -  n)*i]‘/aexp  ^S2/n(l  -  n/m)  -  |f*/n*i(l  “  *»i/»»)]  » 

where  t,-  =  m,/m  (t  =  0, 1).  Hence  Wald’s  likelihood  ratio  identity  yields 

{A  i)  ^’,)<r  “  [*<**  ~  *  -  (‘» 

=  £<m,){((rn  -  T*)/r]1'2  exp(-/C);  r  >  mo}, 

where  -  |[5f./T*(l  -  T*/m)  -  6*]. 

The  equation  (A.2)  is  analogous  to  (2.23)  in  the  proof  of  Theorem  2.18,  and  we  try  to 
evaluate  it  similarly.  Let 

r  *  inf{n  :  (£  +  5»)2/(mi  -  n)(l  -  m-1(mi  -  n))  >  62} 

and 

Pm  =  ^{(f  +  Sr)2/(mi  -  r)(l  -  m_l(mi  -  r))  -  6*}. 
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Past  experience  with  large  deviation  scaling  leads  one  to  expect  that  r/m  — *  t  in  probability 
for  some  constant  t.  We  consider  the  definition  of  r  expressed  in  the  general  form 

(A.3)  r  =  inf{n  :  mh(nfm ,  $n/m)  >  0} 

and  expand  h  in  a  Taylor  series  about  (t,  pt),  where  fi  denotes  the  drift  per  unit  time 
of  Sn  (from  (A.l)  n  —  (o/(l  -  *i)).  This  shows  that  for  n  close  to  the  random  time  r 
mh(n/m,  S„/m)  —  m(h  -  thi  -  fit  fcj)  +  nhx  +  Snk2  +  •  •  ■ ,  where  hi  denotes  differentiation 
with  respect  to  the  «th  argument,  and  h,  hi,  and  hi  are  evaluated  at  The  heuristic 

reasoning  following  (2.27)  suggests  that  the  higher  order  terms  play  no  role  ixr  determining 
the  asymptotic  distribution  of  Rm,  which  is  thus  obtained  by  applying  (2.8)  to  nhi  +  S„hj. 
Although  this  conjecture  is  correct  and  allows  one  to-obtain  easily  the  result  claimed  in 
the  statement  of  Theorem  3.11,  there  are  two  technicalities  making  a  rigorous  proof  more 
difficult,  (i)  Unlike  the  situation  in  Theorem  2.18,  the  -process  5m,  -  S„,  n  =  mj, 
mi  -  1,  •  •  •  is  not  a  random  walk,  so  the  renewal  theorem  is  not  directly  applicable,  (ii) 
Even  if  it  were  a  random  walk,  the  technical  conditions  of  Lai  and  Siegmund  (1977)  are  not 
fulfilled,  and  no  minor  change  in  their  argument  produces  an  appropriate  result. 

Fortunately  (ii)  is  solved  by  Hogan  (1984),  who  considers  nonlinear  renewal  theory  for 
processes  of  the  form  (A.3).  See  Appendix  2.  To  circumvent  (i),  note  that  by  (A.l)  and 
(A.2)  it  suffices  to  evaluate 

^{(m-n/ri^expf-JC);  T*  >  mo} 

=  ^o”-f{[(m  ~  mi  +  r)/(w»i  -  r)]1/2  exp(-Pm);  r  <  mt  -  m0} 

and  then  integrate  out  X.  For  X  of  the  form 

X  =  m£ o/(l  -  «i)  +  nm1/2, 

one  can  easily  calculate  the  likelihood  ratio  of  *i,  •  •  • ,  *n  under  re^a^ve  to  the  un¬ 

conditional  probability  Pf,  (fi  =  (o/{l  -  ft)),  which  has  essentially  the  same  drift  per  unit 


time,  to  obtain 


-  "»i  +  r)/(mi  “  r)]1/2  exp (-/?*);  r  <  mx  -  mo} 

=  ^e|{[mi(m  -  mi  +  r)]1/J/("»i  “  r)}  «P  -  \(8f^ %r)2/(rm  -  r) 

+  2i i(Sr  -  pr)/m1/2  -  j ti2r/{mx  -  r)  ;  r  <  m,  -  m0  J. 

It  is  now  straightforward,  but  tedious  to  use  the  asymptotic  degeneracy  of  r/m,  the 
asymptotic  normality  of  [Sf  -prj/r1/2,  its  asymptotic  independence  from  Am'fas  in  Lemma 
2.16)  and  the  P^-limiting  distribution  of  Jim  given  by  Hogan’s  (1984)  nonlinear  renewal 
theorem  to  evaluate  this  expectation  and  hence  complete  the  proof  of  Theorem  3.11. 


Appendix  2 

Nonlinear  Renewal  Theory 

In  Section  2.3  the  renewal  theorem  was  used  to  approximate  the  distribution  of  the 
excess  of  a  random  walk  at  its  first  passage  across  a  linear  boundary.  In  Theorems  2.18  and 
3.11  similar  problems  arose  with  regard  to  first  passages  to  nonlinear  boundaries.  In  this 
appendix  we  surrey  the  appropriate  nonlinear  renewal  theory. 

The  problem  is  complicated  by  the  fact  that  the  stopping  time  and  the  excess  over  the 
boundary  usually  can  be  defined  in  more  than  one  way.  Conceptually  the  simplest  situation 
involves  a  stopping  time  of  the  form 

(.4.4)  T  =  Tm  =  inf{n  :  Sn  >  mc(n/m)}. 

Here  c(  )  is  a  positive  continuous  function,  5„,  n  =  1, 2,  •  *  *  is  a  random  walk  with  positive 
drift  n  =  E(Si),  and  we  assume  that  the  ray  /it  crosses  the  curve  e(t)  at  exactly  one  point,  t, 
near  which  c(  )  is  twice  continuously  differentiable.  The  stopping  rules  r*  and  r  introduced 
in  the  proofs  of  Theorems  2.18  and  3.11  respectively  are  both  essentially  of  this  form. 

It  follows  from  an  argument  based  on  the  strong  law  of  large  numbers  that  Tm/m  — ►  t 
with  probability  one  as  m  — »  oo.  Since  asm-*  oo  the  curve  me(n/m)  for  n  close  to  mt 
flattens  out,  it  is  natural  to  conjecture  that  the  excess  over  the  curved  boundary,  Rm  = 
Sj  —  me(T/m),  converges  in  law  to  the  same  limit  as  the  excess  over  the  tangent  to  mc(-) 
at  the  point  t,  which  is  given  by  (2.8)  with  S„  —  Sn  -  nc'(t). 

Although  the  conjecture  of  the  preceding  sentence  is  true  under  quite  general  condi¬ 
tions,  in  special  cases  it  follows  from  a  somewhat  different  result,  which  is  considerably 
easier  to  prove.  When  possible,  it  is  convenient  to  rewrite  (A.4)  in  the  form 

(A.5)  T  =  Ta  ~  inf{n  :  ny(r*-1Sn)  >  o} 

for  suitable  g  and  a  (depending  on  m).  For  example,  for  e{t)  —  eo*7  (0  <  7  <  1)  in  (A.4), 
we  find  that  g(x)  =  (z+)(l_',)~l  and  a  =  c£1-7*  m1-1.  For  the  stopping  time  (A.5),  the 
excess  over  the  boundary  is  f2a  =  Tg(SrfT)  -  a.  A  Taylor  series  expansion  of  g  yields 

ng{Sn/n)  =  ng(n)  +  (5„  -  r*/*)yV)  +  ($*  “  «/*)2/(fn)/2», 
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where  fa  satisfies  |f„  -  fi\  <  |n-lS„  -  fi\.  If  g(fi)  >  0  (as  we  shall  assume),  the  linear  part 

of  ng(SH/n)  is  a  random  walk  increasing  at  a  rate  proportional  to  n,  whereas'The  quadratic 

part  is  essentially  constant.  This  leads  one  to  suspect  that  the  limiting  distribution  of  R* 

is  given  by  (2.8)  with  5„  =  ng(fi)  +  (S„  -  nfi)g'(n).  This  conjecture  is  also  true  and  has 

been  given  an  abstract  formulation  by  Lai  and  Siegmund  (1977).  They  consider  stopping 

rules  of  the  form 
» 

(A.6)  T  =  inf{n  :  Sn  +  rjn  >  o}, 

where  Sn ,  n  =  1, 2,  -  —  is  a  non-arithmetic  random  walk  with  positive  mean  /T==  ESi  and  rjn 
changes  sufficiently  slowly  in  a  sense  made  precise  below  that  it  plays  no  role  in  determining 
the  limiting  distribution  of  Sf  +  rjf  -  a  as  a  -*  oo.  A  typical  application  is  to  prove  that 
the  limiting  distribution  of  Rm~  is  as  indicated  above.  Lai  and  Siegmund  also  apply  their 
result  to  approximate  the  significance  level,  (1.2)  with  /i  =  0,  of  a  repeated  significance 
test.  Lalley  (1983)  extends  the  Lai-Siegmund  method  to  the  much  more  difficult  case  of 
multiparameter  exponential  families. 

Although  the  stopping  rule  (1.4)  of  a  repeated  likelihood  ratio  test  in  an  exponential 
family  is  of  the  form  (A.5),  the  arguments  used  in  this  paper  to  prove  Theorems  2.18  and  3.11 
introduce  auxiliary  stopping  rules  which  cannot  be  put  into  that  form.  Actually  r*  defined 
in  (2.28)  to  prove  Theorem  2.18  is  almost  of  the  abstract  form  (A.6)  with  Sn  —  (£l+Sn, 

fjn  =  |m-1  (q1  S *,  and  a  =  1  m(pf  -  {£),  except  that  Lai  and  Siegmund  do  not  permit 

fjn  to  depend  on  a.  A  suitable  essentially  trivial  extension,  modeled  on  Woodroofe’s  (1982) 
reformulation  of  the  Lai-Siegmund  result,  is  given  below.  This  performs  the  dual  function 
of  completing  the  proof  of  Theorem  2.18  and  of  explaining  the  general  nature  of  this  class 
of  results.  Then  we  discuss  briefly  the  method  used  by  Hogan  (1984)  to  deal  with  stopping 
times  of  the  form  (A.3)  or  (A.4). 

Theorem  A.7.  Let  T  be  defined  by  (A.6),  where  Sn,  n  =  1,2, •••  is  a  non-arithmetic 
random  walk  with  positive  mean  ji  =  E(Si)  and  for  all  n  fj„  =  fjn{a)  is  a  measurable 
function  of  Si,  •  •  • ,  Sn.  Suppose  also  that  for  each  A  >  0 

Kbl-^0  (a  —  oo) 


s  +  fi 

n  'n 

s_  ♦  fj  *  §  -  s 

nj  n 


Figure  7 


and  for  each  A,  e  >  0  there  exists  6  =  $(A,e)  such  that 


(A.8) 


Then  as  a  -*  oo,  for  all  x  >  0 

P{T  <  oo,  St  +  hr  -  o  <  *}  — »  H(x), 

where  H  is  defined  to  be  the  right  hand  side  of  (2.8). 

With  the  help  of  Theorem  A.7  one  can  easily  complete  the  proof  of  Theorem  2.18.  The 
critical  condition  in  the  statement  of  Theorem  A.7  is  (A.8).  The  method  of  proof  involves 
conditioning  on  5„,  +  rjn where  ni  is  chosen  so  that  5n,  +  r}n,  is  already  close  enough  to 
a  that  by  (A.8)  rjn  is  constant  (to  within  e)  for  all  nj  <  n  <  f,  but  it  is  far  enough  from 
a  that  the  renewal  theorem  applies  to  the  random  walk  Sn  -  Sn,,  n  =  nj  +  1,  •  •  ••  Hence 
except  for  an  event  of  arbitrarily  small  probability  Sn  +  rjn  and  5„,  +  rjni  +  (5„  -  )  cross  a 

at  the  same  time  and  have  the  same  excess,  to  within  e.  See  Figure  7.  The  renewal  theorem 
gives  the  indicated  limiting  distribution  of  excess  over  the  boundary  for  the  second  process 
and  hence  for  the  process  of  interest. 
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It  is  easy  to  see  where  this  argument  runs  into  difficulty  in  dealing  with  a  stopping  rule 
of  the  form  (A.4)  or  what  is  essentially  the  same,  (A.3).  If  we  assume  that  ESf  <  oo,  the 
variability  in  S„  is  0(nx/2).  Hence  to  have  probability  close  to  one  that  Sn,  <  me(r»i /m), 
one  must  choose  nx  =  mi  -  Km1/2  for  some  large  value  of  K.  But  a  Taylor  expansion 
shows  that  me(n/m)  and  the  tangent  me(t)  +  c'(t)(n  -  mi)  are  essentially  the  same  only  for 
|n  -  mt\  <  6  m1!2  for  small  6.  To  circumvent  this  difficulty  one  can  introduce  the  auxiliary 
stopping  time 

Ti  =  inf{n  :  S„  >  mc(n/m)  -  6m1/2}. 

From  the  fact  that  m~lTm  — ♦  t  and  the  assumption  ES2  <  oo,  it  follows  that  with  proba¬ 
bility  approaching  one 

Sr,  -  (me(Z!i/m)  -  6m1/2]  <  max  (S<  -  S,_!)+  +  sup|e'(t)| 

is  o(m}/2)  and  hence  me(Ti/m)  —  Sr,  is  large.  Moreover,  during  the  approximately  6m1/2/ 
[/*- c'(t)]  additional  steps  the  random  walk  requires  to  cross  the  curve,  the  distance  between 
the  curve  and  its  tangent  is  small,  provided  6  is  small.  Hence  the  Lai-Siegmund  argument 
with  the  random  time  T\  in  place  of  n,  shows  that  the  time  at  which  the  random  walk 
crosses  the  curve  and  the  excess  over  the  curve  are  with  high  probability  equal  to  the  time 
it  crosses  the  tangent  and  almost  equal  to  the  excess  over  the  tangent.  Thus  the  nonlinear 
problem  is  reduced  to  a  linear  one  having  an  answer  given  by  (2.8)  with  S„  =  Sn  -  ne'(t). 

This  argument  is  easily  made  precise  and  also  extended  along  the  lines  of  Lemma  2.16. 
The  result  provides  an  appropriate  tool  for  completing  the  proof  of  Theorem  3.11,  or  of 
Theorem  2.18  for  that  matter. 

Hogan  (1984)  develops  a  much  more  sophisticated  version  of  the  argument  for  stopping 
times  of  the  form  (A.3).  He  does  not  require  that  ES2  <  oo,  and  in  his  definition  of  Tx  he 
replaces  6m1/2  by  a  large  constant  K.  This  minimizes  the  smoothness  conditions  imposed 
on  h  (or  «(•)).  More  importantly,  however,  Hogan’s  method  also  proves  a  nonlinear  renewal 
theorem  in  problems  scaled  for  a  diffusion  approximation,  where  the  methods  described 
here  and  also  Woodroofe’s  (1976a)  method  fail  completely. 

It  would  be  interesting  to  give  an  abstract  formulation  of  Hogan’s  result  for  a  stopping 
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